We will go through examples of how our Full House Model era-adjusts batting averages (BA) using the parametric method, and bWAR (Baseball-Reference Win Above Replacement), fWAR (Fangraphs Win Above Replacement), HR (Home Run), BB (Walks) for batters, bWAR, fWAR, ERA (Earned Run Average), SO (Strikeout) for pitchers using the nonparametric method. Also we provide details about how the tables and figures in the paper and supplementary materials are produced.
We first load in relevant software packages and the data. The necessary data are collected from Baseball-Reference, Fangraphs and Github Chadwick. The details of data collection are in the Supplement Materials. The revelant datasets that involved in this technical report can be found in the Tech_report_data
For batters, each row of data consists of a player ID, a year ID, a name, a age, a league ID (lgID), a recorded number of at bats (AB), game(G), plate appearances (PA), a park-factored batting average (BA), a walk (BB), a observed hits (obs_hits), observed home run (obs_HR), a hit-by-pitch (HBP), a sacrifice bunt (SH), a sacrifice fly (SF), a park-factored hits (H), a park-factored home run (HR), a baseball-reference WAR (bWAR), a fangraph WAR (fWAR) and a size of the MLB-eligible population (pops).
For pitchers, each row of data consists of a player ID, a year ID, a name, a age, a league ID (lgID), a team ID (teamID), a recorded number of inning pitched (IP) and a game(G), a observed earned run (obs_ER), a observed home run (obs_HR), a observed hits (obs_H), a strikeout (SO), a park-factored earned run (ER), a park-factored home run (HR_PF), a park-factored hits (H_PF), a walk (BB), a hit-by-pitch (HBP), a baseball-reference WAR (bWAR), a fangraph WAR (fWAR) and a size of the MLB-eligible population (pops).
rm(list=ls())
library(tidyverse)
library(orderstats)
library(Pareto)
library(doParallel)
library(splines)
library(retrosheet)
library(kableExtra)
library(Lahman)
ncores <- detectCores() - 1
years <- 1871:2023
load("pop_data.RData")
load("bat_dat.RData")
load("pit_dat.RData")
source('source.R')
In this analysis we assume that BA \(\stackrel{i i d}{\sim} N\left(\mu_{i}, \sigma_{i}\right)\) and that talent scores \(\stackrel{i i d}{\sim} \operatorname{Pareto}(\alpha)\) where \(\alpha=1.16\). We do not have any assumptions on the distribution of the baseball statistics. This choice of alpha corresponds to the Pareto principle which is casually referred to as the \(80 / 20\) rule.
In this section, our Full House Model era-adjusts batting statistics, such as bWAR, fWAR, HR, BB and BA using the nonparametric distribution measuring the components. We also apply our Full House Model to era-adjuste BA using the parametric distribution measuring the components.
We now try out the bWAR for batters and first we select the full-time batters. We declare the median PA after screening out individuals who batted fewer than 75 PA as our cutoff for full-time batters.
batters <- bat_dat %>% select(yearID, playerID, lgID, name, age, PA, G, bWAR, pops)
cutoff <- do.call(rbind, mclapply(years, mc.cores = ncores, FUN = function(xx){
m <- batters %>% filter(yearID == xx) %>% filter(PA >= 75)
data.frame(thres = median(m$PA), yearID = xx)
}))
batters <- merge(batters, cutoff, by = 'yearID')
batters <- batters %>% mutate(comp = bWAR / G, full_time = ifelse(PA >= thres, 'Y', 'N'))
In the 1994 and 1995, hitters and pitchers played fewer games per season than the regular season. To deal with extreme statistics from the small sample size, we motivated a shrinkage method that adjust the raw statistics toward a global average. Our shrinkage method follows a shrinkage of ballpark effect estimates motivated from Michael Schell’s book, Baseball’s All-Time Best Sluggers. This methods involves weighted average of raw components and league average, which is
\[\textrm{adjusted component} = (\textrm{raw components} \times \textrm{total ABs} + 4000 \times \textrm{league average})/(\textrm{totals AB + 4000}) \] When we use the shrinkage method to analyze our seasonal data, the shrinkage factor \(4000 / (\textrm{totals AB + 4000})\) get adjusted based on fraction of 4000 to the totals AB of the MLB teams. For example, \[\textrm{adjusted bWAR per game} = (\textrm{raw bWAR per game} \times \textrm{game} + 7 \times \textrm{league average})/(\textrm{game + 7}) \]
batters_schell <- do.call(rbind, mclapply(years, mc.cores = ncores, FUN = function(xx){
int<- batters %>% filter(yearID == xx)
lg_avg <- sum(int$bWAR)/sum(int$G)
int %>% mutate(comp = (bWAR + lg_avg * 7)/(G + 7))
}))
The following script computes the talent scores for bWAR per game from all batters and all seasons (1871 to 2022).
batters_talent_bWAR <- do.call(rbind, mclapply(years, mc.cores = ncores, function(yy){
talent_computing_nonpara(dataset = batters_schell, component_name = "bWAR", year = yy, ystar = thresh_fun(component = batters_schell %>% filter(full_time == 'Y', yearID == yy) %>% select(comp), component_name = 'bWAR'), alpha = 1.16) })) %>% arrange(-WAR_talent)
We built a common mapping environment with regard to the provided season size based on the no-strike seasons rather than utilizing one specific season from 1871 to 2022 as the projected season. To acquire era-adjusted statistics, we project the players’ talent to this common mapping environment. Micheal Schell inspired us to use the National League seasons from 1977 through 1989, with the exception of the 1981 strike season, as the common mapping environment. The number of teams in these seasons that we choose remains constant in order to prevent the Major League’s expansion effect. We then construct an isotonic regression model of the corresponding components on the ordered talent scores. In the common mapping environment, this model provides the association between the components and the talent scores.
We set the total number of full-time players throughout all seasons to be equal to the number of components in the common mapping environment. This is due to the fact that the more components in the common mapping environment, the \(\widetilde{F}_{Y_i}(t)\) and \(F_{Y_i}(t)\) are more closely. Then we select this number of components based on the quantile mapping and the quantiles of the components are equally spaced. The talent scores of the components we choose from the common mapping environment are calculated using the isotonic regression model. Now we construct the common mapping environment completely.
no_strike <- batters_talent_bWAR %>%
filter(yearID < 1990, yearID >= 1977, yearID != 1981,
full_time == 'Y', lgID == 'NL') %>%
arrange(WAR_talent)
t <- isoreg(no_strike$WAR_talent, no_strike$comp)
talent_new <- quantile(no_strike$WAR_talent, probs = (seq(0,250)/250))
comp_new <- as.stepfun(t)(talent_new)
The figure below shows the relationship between bWAR talent and bWAR per game. The black dots represent the observations from the full-time batters from the 1977 season to 1989 season with the exception of the 1981 strike season. The red dots represent the observations from the common mapping environment that we define above. Based on the figure above, the observations from the common mapping environment accurately depict the relationship between bWAR talent and bWAR per game from the 1977 season to the 1989 season with the exception of the 1981 strike season.
mapping_envir <- data.frame(talent_new = talent_new, comp_new = comp_new)
ggplot(data = no_strike, aes(x = WAR_talent, y = comp)) + geom_point() +
geom_point(data = mapping_envir, aes(x = talent_new, y = comp_new), color = 'red') +
labs(x = 'bWAR talent', y = 'bWAR per game') +
scale_x_continuous(trans='log')
ystar <- thresh_fun(comp_new, component_name = 'bWAR')
pop_new <- round(mean(no_strike$pops))
component_name = 'bWAR'
yy <- sort(comp_new)
n <- length(yy)
ytilde <- rep(0, n + 1)
if (component_name == 'bWAR' | component_name == 'fWAR') {
ytilde[1] <- yy[1] - (yy[2] - yy[1])
}
if (component_name == 'HR' | component_name == 'BB') {
# since the minimal HR is greater or equal to 0.
ytilde[1] <- 0
}
ytilde[n+1] <- yy[n] + ystar
ytilde[2:n] <- unlist(lapply(2:n, function(j){
(yy[j]+yy[j-1])/2
}))
We now extend the method to compute hypothetical careers in the common mapping environment and obtain the era-adjusted statistics.
career_kAB_1st <- do.call(rbind, mclapply(1:30000, function(zz){
int <- career_talent_nonpara(dataset = batters_talent_bWAR, component_name = 'bWAR',
snippet = batters_talent_bWAR[zz,], alpha = 1.16)
int
}, mc.cores = ncores))
career_kAB_2nd <- do.call(rbind, mclapply(30001:60000, function(zz){
int <- career_talent_nonpara(dataset = batters_talent_bWAR, component_name = 'bWAR',
snippet = batters_talent_bWAR[zz,], alpha = 1.16)
int
}, mc.cores = ncores))
career_kAB_3rd <- do.call(rbind, mclapply(60001:nrow(batters_talent_bWAR), function(zz){
int <- career_talent_nonpara(dataset = batters_talent_bWAR, component_name = 'bWAR',
snippet = batters_talent_bWAR[zz,], alpha = 1.16)
int
}, mc.cores = ncores))
career_kAB <- rbind(career_kAB_1st, career_kAB_2nd, career_kAB_3rd)
Instead of using the raw G and PA in the data set, we calculate the mapped G by applying quantile mapping for the full-time hitters and non-full-time hitters separately. Quantile mapping is based on that a pth percentile player’s games in one year is equal to a pth percentile player’s games in the common mapping environment.
We also specify that worst performance of bWAR for the full time players in each season -2.
## mapping statistics
mapped_quan_b_raw <- do.call(rbind, mclapply(years, function(xx){
batters_full <- batters %>% filter(yearID == xx) %>%
filter(full_time == 'Y') %>% arrange(-G)
batters_less <- batters %>% filter(yearID == xx) %>%
filter(full_time == 'N') %>% arrange(-G)
n1 <- nrow(batters_full)
n2 <- nrow(batters_less)
mapped_G_full <- c()
mapped_G_less <- c()
for (yy in c(1977:1980, 1982:1989)) {
batters_ref_full <- batters %>%
filter(yearID == yy, full_time == 'Y') %>% arrange(-G)
batters_ref_less <- batters %>%
filter(yearID == yy, full_time == 'N') %>% arrange(-G)
n1r <- nrow(batters_ref_full)
n2r <- nrow(batters_ref_less)
mapped_G_full <- cbind(mapped_G_full, approx(x = seq((n1r-1),0)/(n1r-1),
y = batters_ref_full$G,
xout = seq((n1-1),0)/(n1-1))$y)
mapped_G_less <- cbind(mapped_G_less, approx(x = seq((n2r-1),0)/(n2r-1),
y = batters_ref_less$G,
xout = seq((n2-1),0)/(n2-1))$y)
}
batters_full$mapped_G <- rowMeans(mapped_G_full)
batters_less$mapped_G <- rowMeans(mapped_G_less)
batters_full <- batters_full %>% arrange(-PA)
batters_less <- batters_less %>% arrange(-PA)
mapped_PA_full <- c()
mapped_PA_less <- c()
for (yy in c(1977:1980, 1982:1989)) {
batters_ref_full <- batters %>%
filter(yearID == yy, full_time == 'Y') %>% arrange(-PA)
batters_ref_less <- batters %>%
filter(yearID == yy, full_time == 'N') %>% arrange(-PA)
n1r <- nrow(batters_ref_full)
n2r <- nrow(batters_ref_less)
mapped_PA_full <- cbind(mapped_PA_full, approx(x = seq((n1r-1),0)/(n1r-1),
y = batters_ref_full$PA,
xout = seq((n1-1),0)/(n1-1))$y)
mapped_PA_less <- cbind(mapped_PA_less, approx(x = seq((n2r-1),0)/(n2r-1),
y = batters_ref_less$PA,
xout = seq((n2-1),0)/(n2-1))$y)
}
batters_full$mapped_PA <- rowMeans(mapped_PA_full)
batters_less$mapped_PA <- rowMeans(mapped_PA_less)
m <- rbind(batters_full, batters_less)
data.frame(playerID = m$playerID, yearID = m$yearID,
mapped_G_raw = round(m$mapped_G), mapped_PA_raw = round(m$mapped_PA))
}, mc.cores = ncores))
mapped_batters_1 <- merge(career_kAB, mapped_quan_b_raw,
by = c('playerID', 'yearID'))
min_refbWAR <- -2
mapped_batters_bWAR <- mapped_batters_1 %>%
mutate(adj_bWAR = adj_comp * mapped_G_raw) %>%
mutate(adj_bWAR = ifelse(adj_bWAR < min_refbWAR, min_refbWAR, adj_bWAR)) %>%
mutate(mapped_G_bWAR = round(adj_bWAR / adj_comp))
Then we apply the same techniques for fWAR.
We also apply our full house model to BB and HR. Before we do that, we get lower bounds on walk rate and home run rate. We isolate players with batting records from the common mapping environment and restrict attention to players with 10 years of batting records. Then we obtain the 0.03rd quantile of walk rate and home run rate for the these players for seasons in which they obtained at least 400 PAs and serve the quantiles as minimum allowable rates in the common mapping environment.
IDs7789 = Batting %>%
filter(yearID >= 1977, yearID <= 1989, yearID != 1981) %>%
pull(playerID)
IDs10 = Batting %>%
filter(playerID %in% IDs7789) %>%
group_by(playerID) %>%
summarise(n = n()) %>%
filter(n >= 10) %>%
pull(playerID)
lowerb <- Batting %>%
filter(playerID %in% IDs10) %>%
filter(yearID >= 1977, yearID <= 1989, yearID != 1981) %>%
mutate(PA = AB + BB + HBP + SH + SF) %>%
filter(PA >= 400) %>%
mutate(BBrate = BB/PA, HRrate = HR/AB) %>%
select(playerID, yearID, BBrate, HRrate) %>%
summarise(Q03BB = quantile(BBrate, probs = 0.03),
Q03HR = quantile(HRrate, probs = 0.03))
lowerb
## Q03BB Q03HR
## 1 0.03512232 0.001848296
Stephen Jay Gould suggests that the BA in every season follows normal distribution and we perform Shapiro-Wilk Normality Test to verify this argument. Based on the results, 16 out of 152 seasons from 1971 to 2022 seasons fail the normality test. The 16 seasons are 1876, 1895, 1896, 1904, 1908, 1912, 1916, 1922, 1924, 1941, 1946, 1959, 1961, 1967, 1972, 1987, 1994, 1997.
batters <- bat_dat
batters <- merge(batters, cutoff, by = 'yearID')
batters <- batters %>% select(playerID, yearID, lgID, name, age, lgID, AB,
PA, obs_hits, H, thres, bWAR, pops) %>%
mutate(comp = ifelse(AB != 0, H / AB, 0), full_time = ifelse(PA >= thres, 'Y', 'N'))
batters_schell <- do.call(rbind, mclapply(years, mc.cores = ncores, FUN = function(xx){
int<- batters %>% filter(yearID == xx)
lg_avg <- sum(int$H)/sum(int$AB)
int %>% mutate(comp = (H + lg_avg * 25)/(AB + 25))
}))
normality <- do.call(rbind, mclapply(years, mc.cores = ncores, FUN = function(xx){
m <- batters_schell %>% filter(yearID == xx, full_time == 'Y') %>% pull(comp) %>% shapiro.test()
data.frame(yearID = xx, p_value = m$p.value)
}))
years[which(normality$p_value <= 0.05)]
## [1] 1876 1895 1896 1904 1908 1912 1916 1922 1924 1941 1946 1959 1961 1967 1972
## [16] 1987 1994 1997 2023
Now we use both parametric and non-parametric distribution to measure the BA.
We eliminate players who had an era-adjusted bWAR or fWAR below the replacement level for more than half of their career seasons. We also set three rules to eliminate some players’ poor early and late career seasons. The three rules are
The value 0.2 is calculated from the average bWAR or fWAR of the players that disappeared from the MLB from the 1977 season to 1989 season with the exception of 1981 season.
After getting the era-adjusted statistics, we find some players’ statistics dramatically change in tails of their career, which is unrealistic in the real life. To solve it, we apply some smoothing methods to alleviate these dramatic variations, such as local polynomial regression fitting and natural cubic spline. Then natural cubic spline method has the minimal bias and is considered as the best option compared with other methods.
Also, we notice that the smoothing method could weaken player’s prime or extreme seasons. Therefore, we take the average of the smoothed era-adjusted statistics and era-adjusted statistics, which can keep player’s prime season and alleviate the dramatic changes in the tail of their career.
AVG_part <- mapped_batters_AVG_nonpara %>%
mutate(adj_AVG = round(adj_AVG, 3)) %>%
select(yearID, playerID, name, adj_AVG)
HR_part <- mapped_batters_HR %>%
mutate(mapped_PA = round(mapped_PA)) %>%
mutate(adj_HR = round(adj_HR)) %>%
mutate(adj_AB = round(adj_AB)) %>%
select(yearID, playerID, adj_HR, adj_AB, HBP, SF, mapped_PA)
bWAR_part <- mapped_batters_bWAR %>%
mutate(adj_bWAR = round(adj_bWAR, 2)) %>%
select(yearID, playerID, adj_bWAR, mapped_G_bWAR)
fWAR_part <- mapped_batters_fWAR %>%
mutate(adj_fWAR = round(adj_fWAR, 2)) %>%
select(yearID, playerID, age, adj_fWAR, mapped_G_fWAR)
BB_part <- mapped_batters_HR %>%
mutate(adj_BB = round(adj_BB)) %>%
select(yearID, playerID, BB, adj_BB)
master_batters <- merge(BB_part, merge(AVG_part,
merge(HR_part, merge(bWAR_part, fWAR_part,
by = c('yearID', 'playerID')),
by = c('yearID', 'playerID')),
by = c('yearID', 'playerID')),
by = c('yearID', 'playerID'))
master_batters <- master_batters %>%
mutate(adj_OBP = round((adj_AVG * adj_AB + adj_BB + HBP) / (adj_AB + adj_BB + HBP + SF), 3))
master_batters$adj_OBP[is.na(master_batters$adj_OBP)] <- 0
master_batters_nonpara <- master_batters %>% mutate(adj_BB = ifelse(adj_BB < 0, 0, adj_BB)) %>%
mutate(adj_AB = ifelse(mapped_PA < adj_AB, mapped_PA, adj_AB))
master_batters_nonpara$mapped_G <- apply(master_batters_nonpara[,c(13,16)], 1, min)
## Batters
batters <- master_batters_nonpara
## extract and remove bad players
foo <- batters %>%
arrange(desc(adj_AVG)) %>%
filter(adj_AB >= 300) %>%
dplyr::select(name, playerID, yearID, adj_AB, adj_AVG, adj_OBP, adj_HR, adj_fWAR, adj_bWAR) %>%
mutate(adj_HR_AB = round(adj_HR/adj_AB,4))
bar <- split(foo, as.factor(foo$playerID))
baz <- do.call(rbind, lapply(bar, function(m){
m[which.max(m$adj_fWAR), ]
}))
baz <- baz %>% arrange(adj_fWAR)
bad_players_fWAR <- baz %>% filter(adj_fWAR < 0) %>% pull(playerID)
baz <- do.call(rbind, lapply(bar, function(m){
m[which.max(m$adj_bWAR), ]
}))
baz <- baz %>% arrange(adj_bWAR)
bad_players_bWAR <- baz %>% filter(adj_bWAR < 0) %>% pull(playerID)
bad_players <- union(bad_players_bWAR, bad_players_fWAR)
batters <- batters[!batters$playerID %in% bad_players, ]
###### investigate anomalies ######
## more on base events than PAs
# check average for minimal at bats and correct issues
batters[batters$adj_AVG > 0 & batters$adj_AB == 0, ]$adj_AVG <- 0
batters <- batters %>% mutate(adj_hits = round(adj_AVG * adj_AB))
batters[batters$adj_AB > 0, ]$adj_AVG <-
round(batters[batters$adj_AB > 0, ]$adj_hits / batters[batters$adj_AB > 0, ]$adj_AB, 3)
## build adjusted data set
batters_adjusted <- batters %>%
dplyr::select(name, playerID, age, yearID, mapped_PA, adj_AB, adj_hits, adj_HR, adj_BB,
adj_AVG, adj_OBP, HBP, SF, adj_bWAR, adj_fWAR)
colnames(batters_adjusted) <- c("name", "playerID", "age", "year", "mapped_PA", "adj_AB", "adj_H",
"adj_HR", "adj_BB", "adj_AVG", "adj_OBP", "HBP", "SF", "adj_bWAR", "adj_fWAR")
batters_adjusted$playerID <- droplevels(as.factor(batters_adjusted$playerID))
## trim out bad players
# first round
foo <- split(batters_adjusted, f = batters_adjusted$playerID)
bar <- lapply(foo, function(m){
ifelse(m$adj_bWAR <= 0, 1, 0) + ifelse(m$adj_fWAR <= 0, 1, 0)
})
checker <- data.frame(pid = levels(batters_adjusted$playerID),
m_bad = unlist(lapply(bar, mean)),
len = unlist(lapply(bar, length)))
batters_adjusted <- batters_adjusted %>%
filter(!batters_adjusted$playerID %in% rownames(checker)[checker$m_bad == 2])
batters_adjusted$playerID <- droplevels(batters_adjusted$playerID)
# second round
foo <- split(batters_adjusted, f = batters_adjusted$playerID)
bar <- lapply(foo, function(m){
ifelse(m$adj_bWAR <= 0, 1, 0) + ifelse(m$adj_fWAR <= 0, 1, 0)
})
checker <- data.frame(pid = levels(batters_adjusted$playerID),
m_bad = unlist(lapply(bar, mean)),
len = unlist(lapply(bar, length)))
batters_adjusted <- batters_adjusted %>%
filter(!batters_adjusted$playerID %in% rownames(checker)[checker$m_bad >= 1 & checker$len <= 2])
batters_adjusted$playerID <- droplevels(batters_adjusted$playerID)
# third round
foo <- split(batters_adjusted, f = batters_adjusted$playerID)
bar <- lapply(foo, function(m){
min(ifelse(m$adj_bWAR <= 0, 1, 0) + ifelse(m$adj_fWAR <= 0, 1, 0))
})
checker <- data.frame(pid = levels(batters_adjusted$playerID),
m_bad = unlist(lapply(bar, mean)),
len = unlist(lapply(bar, length)))
batters_adjusted <- batters_adjusted %>%
filter(!batters_adjusted$playerID %in% rownames(checker)[checker$m_bad >= 1])
batters_adjusted$playerID <- droplevels(batters_adjusted$playerID)
# forth round
foo <- split(batters_adjusted, f = batters_adjusted$playerID)
bar <- lapply(foo, function(m){
ifelse(m$adj_bWAR == -2, 1, 0) + ifelse(m$adj_fWAR == -2, 1, 0)
})
checker <- data.frame(pid = levels(batters_adjusted$playerID),
m_bad = unlist(lapply(bar, mean)),
len = unlist(lapply(bar, length)))
batters_adjusted <- batters_adjusted %>%
filter(!batters_adjusted$playerID %in% rownames(checker)[checker$m_bad >= 1])
batters_adjusted$playerID <- droplevels(batters_adjusted$playerID)
## remove tails
foo <- split(batters_adjusted, f = batters_adjusted$playerID)
bar <- lapply(foo, function(m){
bad <- ifelse(m$adj_bWAR <= 0.2, 1, 0) + ifelse(m$adj_fWAR <= 0.2, 1, 0)
bad_tail <- sum(c(ifelse(sum(tail(bad, 2)) >= 3,1,0),
ifelse(sum(tail(bad, 3)) >= 5,1,0),
ifelse(sum(tail(bad, 4)) >= 7,1,0),
ifelse(sum(tail(bad, 5)) >= 9,1,0),
ifelse(sum(tail(bad, 6)) >= 11,1,0)))
1:(length(bad)-bad_tail)
})
batters_adjusted_1 <- do.call(rbind, lapply(1:length(bar), function(j){
foo[[j]][bar[[j]], ]
})) %>% arrange(year)
foo <- split(batters_adjusted_1, f = batters_adjusted_1$playerID)
bar <- lapply(foo, function(m){
bad <- ifelse(m$adj_fWAR <= -1.5, 1, 0)
bad_tail <- sum(c(ifelse(sum(tail(bad, 2)) >= 2,1,0),
ifelse(sum(tail(bad, 3)) >= 3,1,0),
ifelse(sum(tail(bad, 4)) >= 4,1,0),
ifelse(sum(tail(bad, 5)) >= 5,1,0),
ifelse(sum(tail(bad, 6)) >= 6,1,0)))
1:(length(bad)-bad_tail)
})
batters_adjusted_2 <- do.call(rbind, lapply(1:length(bar), function(j){
foo[[j]][bar[[j]], ]
})) %>% arrange(year)
foo <- split(batters_adjusted_2, f = batters_adjusted_2$playerID)
bar <- lapply(foo, function(m){
bad <- ifelse(m$adj_bWAR <= -1.5, 1, 0)
bad_tail <- sum(c(ifelse(sum(tail(bad, 2)) >= 2,1,0),
ifelse(sum(tail(bad, 3)) >= 3,1,0),
ifelse(sum(tail(bad, 4)) >= 4,1,0),
ifelse(sum(tail(bad, 5)) >= 5,1,0),
ifelse(sum(tail(bad, 6)) >= 6,1,0)))
1:(length(bad)-bad_tail)
})
batters_adjusted_3 <- do.call(rbind, lapply(1:length(bar), function(j){
foo[[j]][bar[[j]], ]
})) %>% arrange(year)
## remove starts
foo <- split(batters_adjusted_3, f = batters_adjusted_3$playerID)
bar <- lapply(foo, function(m){
bad <- ifelse(m$adj_bWAR <= 0, 1, 0) + ifelse(m$adj_fWAR <= 0, 1, 0)
bad_head <- sum(c(ifelse(sum(head(bad, 1)) == 2,1,0),
ifelse(sum(head(bad, 2)) >= 3,1,0),
ifelse(sum(head(bad, 3)) >= 5,1,0),
ifelse(sum(head(bad, 4)) >= 7,1,0),
ifelse(sum(head(bad, 5)) >= 9,1,0),
ifelse(sum(head(bad, 6)) >= 11,1,0)))
if (bad_head < length(bad)) {
(bad_head + 1):length(bad)
}
})
batters_adjusted_4 <- do.call(rbind, lapply(1:length(bar), function(j){
foo[[j]][bar[[j]], ]
})) %>% arrange(year)
batters_adjusted_4$playerID <- droplevels(batters_adjusted_4$playerID)
# taper down average WAR for players with small PAs
batters_adjusted_4[batters_adjusted_4$mapped_PA <= 20, ]$adj_fWAR <-
round(batters_adjusted_4[batters_adjusted_4$mapped_PA <= 20, ]$adj_fWAR/9,2)
batters_adjusted_4[batters_adjusted_4$mapped_PA <= 20, ]$adj_bWAR <-
round(batters_adjusted_4[batters_adjusted_4$mapped_PA <= 20, ]$adj_bWAR/9,2)
batters_adjusted <- do.call(rbind, mclapply(
split(batters_adjusted_4, f = droplevels(as.factor(batters_adjusted_4$playerID))),
mc.cores = ncores, FUN = function(xx){
## natural cubic spline
#ns_AVG = lm(adj_AVG ~ ns(year, df=6), data=xx)
#nn_AVG <- predict(ns_AVG, data.frame("year"= xx$year))
#ns_HR = lm(adj_HR ~ ns(year, df=6), data=xx)
#nn_HR <- predict(ns_HR, data.frame("year"= xx$year))
#ns_BB = lm(adj_BB ~ ns(year, df=6), data=xx)
#nn_BB <- predict(ns_BB, data.frame("year"= xx$year))
#ns_bWAR = lm(adj_bWAR ~ ns(year, df=6), data=xx)
#nn_bWAR <- predict(ns_bWAR, data.frame("year"= xx$year))
#ns_fWAR = lm(adj_fWAR ~ ns(year, df=6), data=xx)
#nn_fWAR <- predict(ns_fWAR, data.frame("year"= xx$year))
xx %>% mutate(AVG = round(adj_AVG, 3)) %>%
mutate(HR = round(adj_HR)) %>%
mutate(BB = round(adj_BB)) %>%
mutate(ebWAR = round(adj_bWAR, 2)) %>%
mutate(efWAR = round(adj_fWAR, 2))
}))
bat_season <- batters_adjusted %>%
mutate(PA = adj_AB + BB + HBP + SF)
bat_season[bat_season$HR >= bat_season$adj_AB, ]$HR <-
bat_season[bat_season$HR >= bat_season$adj_AB, ]$adj_AB
bat_season <- bat_season %>%
dplyr::select(-c("adj_H", "adj_HR", "adj_BB", "adj_AVG", "adj_OBP", "adj_bWAR", "adj_fWAR", 'mapped_PA'))
bat_season <- bat_season %>% mutate(OBP = round((AVG * adj_AB + BB + HBP)/(adj_AB + BB + HBP + SF), 3 )) %>%
mutate(AVG = ifelse(AVG > 0, AVG, 0)) %>%
mutate(HR = ifelse(HR > 0, HR, 0)) %>%
mutate(BB = ifelse(BB > 0, BB, 0)) %>%
mutate(OBP = ifelse(OBP > 0, OBP, 0))
colnames(bat_season)[5] <- 'AB'
colnames(bat_season)[8] <- 'BA'
bat_season_nonpara <- bat_season %>% mutate(H = ceiling(AB * BA)) %>%
mutate(BA = round(H / AB, 3)) %>%
mutate(BA = ifelse(AB == 0, 0, BA)) %>%
mutate(OBP = round((H+BB+HBP)/(AB+BB+HBP+SF), 3)) %>%
mutate(OBP = ifelse(AB+BB+HBP+SF == 0, 0, OBP))
bat_career_nonpara <- bat_season_nonpara %>% group_by(playerID) %>%
summarise(name = unique(name),
playerID = unique(playerID),
PA = sum(round(PA)),
AB = sum(AB),
H = sum(H),
HR = sum(round(HR)),
BB = sum(round(BB)),
BA = round(H/AB, 3),
HBP = sum(HBP),
SF = sum(SF),
OBP = round((H + BB + HBP)/(AB + BB + HBP + SF), 3),
ebWAR = sum(ebWAR),
efWAR = sum(efWAR)) %>% ungroup() %>%
arrange(desc(ebWAR))
bat_career_nonpara <- bat_career_nonpara %>% mutate(BA = ifelse(AB == 0, 0, BA))
bat_career_nonpara <- bat_career_nonpara %>% mutate(OBP = ifelse(AB + BB + HBP + SF == 0, 0, OBP))
AVG_part <- mapped_batters_AVG_para %>%
mutate(adj_AVG = round(adj_AVG, 3)) %>%
select(yearID, playerID, name, adj_AVG)
HR_part <- mapped_batters_HR %>%
mutate(mapped_PA = round(mapped_PA)) %>%
mutate(adj_HR = round(adj_HR)) %>%
mutate(adj_AB = round(adj_AB)) %>%
select(yearID, playerID, adj_HR, adj_AB, HBP, SF, mapped_PA)
bWAR_part <- mapped_batters_bWAR %>%
mutate(adj_bWAR = round(adj_bWAR, 2)) %>%
select(yearID, playerID, adj_bWAR, mapped_G_bWAR)
fWAR_part <- mapped_batters_fWAR %>%
mutate(adj_fWAR = round(adj_fWAR, 2)) %>%
select(yearID, playerID, age, adj_fWAR, mapped_G_fWAR)
BB_part <- mapped_batters_HR %>%
mutate(adj_BB = round(adj_BB)) %>%
select(yearID, playerID, BB, adj_BB)
master_batters <- merge(BB_part, merge(AVG_part,
merge(HR_part, merge(bWAR_part, fWAR_part,
by = c('yearID', 'playerID')),
by = c('yearID', 'playerID')),
by = c('yearID', 'playerID')),
by = c('yearID', 'playerID'))
master_batters <- master_batters %>%
mutate(adj_OBP = round((adj_AVG * adj_AB + adj_BB + HBP) / (adj_AB + adj_BB + HBP + SF), 3))
master_batters$adj_OBP[is.na(master_batters$adj_OBP)] <- 0
master_batters_para <- master_batters %>% mutate(adj_BB = ifelse(adj_BB < 0, 0, adj_BB)) %>%
mutate(adj_AB = ifelse(mapped_PA < adj_AB, mapped_PA, adj_AB))
master_batters_para$mapped_G <- apply(master_batters_para[,c(13,16)], 1, min)
## Batters
batters <- master_batters_para
## extract and remove bad players
foo <- batters %>%
arrange(desc(adj_AVG)) %>%
filter(adj_AB >= 300) %>%
dplyr::select(name, playerID, yearID, adj_AB, adj_AVG, adj_OBP, adj_HR, adj_fWAR, adj_bWAR) %>%
mutate(adj_HR_AB = round(adj_HR/adj_AB,4))
bar <- split(foo, as.factor(foo$playerID))
baz <- do.call(rbind, lapply(bar, function(m){
m[which.max(m$adj_fWAR), ]
}))
baz <- baz %>% arrange(adj_fWAR)
bad_players_fWAR <- baz %>% filter(adj_fWAR < 0) %>% pull(playerID)
baz <- do.call(rbind, lapply(bar, function(m){
m[which.max(m$adj_bWAR), ]
}))
baz <- baz %>% arrange(adj_bWAR)
bad_players_bWAR <- baz %>% filter(adj_bWAR < 0) %>% pull(playerID)
bad_players <- union(bad_players_bWAR, bad_players_fWAR)
batters <- batters[!batters$playerID %in% bad_players, ]
###### investigate anomalies ######
## more on base events than PAs
# check average for minimal at bats and correct issues
batters[batters$adj_AVG > 0 & batters$adj_AB == 0, ]$adj_AVG <- 0
batters <- batters %>% mutate(adj_hits = round(adj_AVG * adj_AB))
batters[batters$adj_AB > 0, ]$adj_AVG <-
round(batters[batters$adj_AB > 0, ]$adj_hits / batters[batters$adj_AB > 0, ]$adj_AB, 3)
## build adjusted data set
batters_adjusted <- batters %>%
dplyr::select(name, playerID, age, yearID, mapped_PA, adj_AB, adj_hits, adj_HR, adj_BB,
adj_AVG, adj_OBP, HBP, SF, adj_bWAR, adj_fWAR)
colnames(batters_adjusted) <- c("name", "playerID", "age", "year", "mapped_PA", "adj_AB", "adj_H",
"adj_HR", "adj_BB", "adj_AVG", "adj_OBP", "HBP", "SF", "adj_bWAR", "adj_fWAR")
batters_adjusted$playerID <- droplevels(as.factor(batters_adjusted$playerID))
## trim out bad players
# first round
foo <- split(batters_adjusted, f = batters_adjusted$playerID)
bar <- lapply(foo, function(m){
ifelse(m$adj_bWAR <= 0, 1, 0) + ifelse(m$adj_fWAR <= 0, 1, 0)
})
checker <- data.frame(pid = levels(batters_adjusted$playerID),
m_bad = unlist(lapply(bar, mean)),
len = unlist(lapply(bar, length)))
batters_adjusted <- batters_adjusted %>%
filter(!batters_adjusted$playerID %in% rownames(checker)[checker$m_bad == 2])
batters_adjusted$playerID <- droplevels(batters_adjusted$playerID)
# second round
foo <- split(batters_adjusted, f = batters_adjusted$playerID)
bar <- lapply(foo, function(m){
ifelse(m$adj_bWAR <= 0, 1, 0) + ifelse(m$adj_fWAR <= 0, 1, 0)
})
checker <- data.frame(pid = levels(batters_adjusted$playerID),
m_bad = unlist(lapply(bar, mean)),
len = unlist(lapply(bar, length)))
batters_adjusted <- batters_adjusted %>%
filter(!batters_adjusted$playerID %in% rownames(checker)[checker$m_bad >= 1 & checker$len <= 2])
batters_adjusted$playerID <- droplevels(batters_adjusted$playerID)
# third round
foo <- split(batters_adjusted, f = batters_adjusted$playerID)
bar <- lapply(foo, function(m){
min(ifelse(m$adj_bWAR <= 0, 1, 0) + ifelse(m$adj_fWAR <= 0, 1, 0))
})
checker <- data.frame(pid = levels(batters_adjusted$playerID),
m_bad = unlist(lapply(bar, mean)),
len = unlist(lapply(bar, length)))
batters_adjusted <- batters_adjusted %>%
filter(!batters_adjusted$playerID %in% rownames(checker)[checker$m_bad >= 1])
batters_adjusted$playerID <- droplevels(batters_adjusted$playerID)
# forth round
foo <- split(batters_adjusted, f = batters_adjusted$playerID)
bar <- lapply(foo, function(m){
ifelse(m$adj_bWAR == -2, 1, 0) + ifelse(m$adj_fWAR == -2, 1, 0)
})
checker <- data.frame(pid = levels(batters_adjusted$playerID),
m_bad = unlist(lapply(bar, mean)),
len = unlist(lapply(bar, length)))
batters_adjusted <- batters_adjusted %>%
filter(!batters_adjusted$playerID %in% rownames(checker)[checker$m_bad >= 1])
batters_adjusted$playerID <- droplevels(batters_adjusted$playerID)
## remove tails
foo <- split(batters_adjusted, f = batters_adjusted$playerID)
bar <- lapply(foo, function(m){
bad <- ifelse(m$adj_bWAR <= 0.2, 1, 0) + ifelse(m$adj_fWAR <= 0.2, 1, 0)
bad_tail <- sum(c(ifelse(sum(tail(bad, 2)) >= 3,1,0),
ifelse(sum(tail(bad, 3)) >= 5,1,0),
ifelse(sum(tail(bad, 4)) >= 7,1,0),
ifelse(sum(tail(bad, 5)) >= 9,1,0),
ifelse(sum(tail(bad, 6)) >= 11,1,0)))
1:(length(bad)-bad_tail)
})
batters_adjusted_1 <- do.call(rbind, lapply(1:length(bar), function(j){
foo[[j]][bar[[j]], ]
})) %>% arrange(year)
foo <- split(batters_adjusted_1, f = batters_adjusted_1$playerID)
bar <- lapply(foo, function(m){
bad <- ifelse(m$adj_fWAR <= -1.5, 1, 0)
bad_tail <- sum(c(ifelse(sum(tail(bad, 2)) >= 2,1,0),
ifelse(sum(tail(bad, 3)) >= 3,1,0),
ifelse(sum(tail(bad, 4)) >= 4,1,0),
ifelse(sum(tail(bad, 5)) >= 5,1,0),
ifelse(sum(tail(bad, 6)) >= 6,1,0)))
1:(length(bad)-bad_tail)
})
batters_adjusted_2 <- do.call(rbind, lapply(1:length(bar), function(j){
foo[[j]][bar[[j]], ]
})) %>% arrange(year)
foo <- split(batters_adjusted_2, f = batters_adjusted_2$playerID)
bar <- lapply(foo, function(m){
bad <- ifelse(m$adj_bWAR <= -1.5, 1, 0)
bad_tail <- sum(c(ifelse(sum(tail(bad, 2)) >= 2,1,0),
ifelse(sum(tail(bad, 3)) >= 3,1,0),
ifelse(sum(tail(bad, 4)) >= 4,1,0),
ifelse(sum(tail(bad, 5)) >= 5,1,0),
ifelse(sum(tail(bad, 6)) >= 6,1,0)))
1:(length(bad)-bad_tail)
})
batters_adjusted_3 <- do.call(rbind, lapply(1:length(bar), function(j){
foo[[j]][bar[[j]], ]
})) %>% arrange(year)
## remove starts
foo <- split(batters_adjusted_3, f = batters_adjusted_3$playerID)
bar <- lapply(foo, function(m){
bad <- ifelse(m$adj_bWAR <= 0, 1, 0) + ifelse(m$adj_fWAR <= 0, 1, 0)
bad_head <- sum(c(ifelse(sum(head(bad, 1)) == 2,1,0),
ifelse(sum(head(bad, 2)) >= 3,1,0),
ifelse(sum(head(bad, 3)) >= 5,1,0),
ifelse(sum(head(bad, 4)) >= 7,1,0),
ifelse(sum(head(bad, 5)) >= 9,1,0),
ifelse(sum(head(bad, 6)) >= 11,1,0)))
if (bad_head < length(bad)) {
(bad_head + 1):length(bad)
}
})
batters_adjusted_4 <- do.call(rbind, lapply(1:length(bar), function(j){
foo[[j]][bar[[j]], ]
})) %>% arrange(year)
batters_adjusted_4$playerID <- droplevels(batters_adjusted_4$playerID)
# taper down average WAR for players with small PAs
batters_adjusted_4[batters_adjusted_4$mapped_PA <= 20, ]$adj_fWAR <-
round(batters_adjusted_4[batters_adjusted_4$mapped_PA <= 20, ]$adj_fWAR/9,2)
batters_adjusted_4[batters_adjusted_4$mapped_PA <= 20, ]$adj_bWAR <-
round(batters_adjusted_4[batters_adjusted_4$mapped_PA <= 20, ]$adj_bWAR/9,2)
batters_adjusted <- do.call(rbind, mclapply(
split(batters_adjusted_4, f = droplevels(as.factor(batters_adjusted_4$playerID))),
mc.cores = ncores, FUN = function(xx){
## natural cubic spline
#ns_AVG = lm(adj_AVG ~ ns(year, df=6), data=xx)
#nn_AVG <- predict(ns_AVG, data.frame("year"= xx$year))
#ns_HR = lm(adj_HR ~ ns(year, df=6), data=xx)
#nn_HR <- predict(ns_HR, data.frame("year"= xx$year))
#ns_BB = lm(adj_BB ~ ns(year, df=6), data=xx)
#nn_BB <- predict(ns_BB, data.frame("year"= xx$year))
#ns_bWAR = lm(adj_bWAR ~ ns(year, df=6), data=xx)
#nn_bWAR <- predict(ns_bWAR, data.frame("year"= xx$year))
#ns_fWAR = lm(adj_fWAR ~ ns(year, df=6), data=xx)
#nn_fWAR <- predict(ns_fWAR, data.frame("year"= xx$year))
xx %>% mutate(AVG = round(adj_AVG, 3)) %>%
mutate(HR = round(adj_HR)) %>%
mutate(BB = round(adj_BB)) %>%
mutate(ebWAR = round(adj_bWAR, 2)) %>%
mutate(efWAR = round(adj_fWAR, 2))
}))
bat_season <- batters_adjusted %>%
mutate(PA = adj_AB + BB + HBP + SF)
bat_season[bat_season$HR >= bat_season$adj_AB, ]$HR <-
bat_season[bat_season$HR >= bat_season$adj_AB, ]$adj_AB
bat_season <- bat_season %>%
dplyr::select(-c("adj_H", "adj_HR", "adj_BB", "adj_AVG", "adj_OBP", "adj_bWAR", "adj_fWAR", 'mapped_PA'))
bat_season <- bat_season %>% mutate(OBP = round((AVG * adj_AB + BB + HBP)/(adj_AB + BB + HBP + SF), 3 )) %>%
mutate(AVG = ifelse(AVG > 0, AVG, 0)) %>%
mutate(HR = ifelse(HR > 0, HR, 0)) %>%
mutate(BB = ifelse(BB > 0, BB, 0)) %>%
mutate(OBP = ifelse(OBP > 0, OBP, 0))
colnames(bat_season)[5] <- 'AB'
colnames(bat_season)[8] <- 'BA'
bat_season_para <- bat_season %>% mutate(H = ceiling(AB * BA)) %>%
mutate(BA = round(H / AB, 3)) %>%
mutate(BA = ifelse(AB == 0, 0, BA)) %>%
mutate(OBP = round((H+BB+HBP)/(AB+BB+HBP+SF), 3)) %>%
mutate(OBP = ifelse(AB+BB+HBP+SF == 0, 0, OBP))
bat_career_para <- bat_season_para %>% group_by(playerID) %>%
summarise(name = unique(name),
playerID = unique(playerID),
PA = sum(round(PA)),
AB = sum(AB),
H = sum(H),
HR = sum(round(HR)),
BB = sum(round(BB)),
BA = round(H/AB, 3),
HBP = sum(HBP),
SF = sum(SF),
OBP = round((H + BB + HBP)/(AB + BB + HBP + SF), 3),
ebWAR = sum(ebWAR),
efWAR = sum(efWAR)) %>% ungroup() %>%
arrange(desc(ebWAR))
bat_career_para <- bat_career_para %>% mutate(BA = ifelse(AB == 0, 0, BA))
bat_career_para <- bat_career_para %>% mutate(OBP = ifelse(AB + BB + HBP + SF == 0, 0, OBP))
The career results for batters uses non-parametric distribution
measuring the BA with top 15 bWAR batting leaders.
| playerID | name | PA | AB | H | HR | BB | BA | HBP | SF | OBP | ebWAR | efWAR |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| bondsba01 | Barry Bonds | 12740 | 10170 | 3049 | 654 | 2373 | 0.300 | 106 | 91 | 0.434 | 153.89 | 145.24 |
| mayswi01 | Willie Mays | 12814 | 11151 | 3475 | 577 | 1528 | 0.312 | 44 | 91 | 0.394 | 144.08 | 135.39 |
| aaronha01 | Henry Aaron | 14113 | 12540 | 3887 | 689 | 1420 | 0.310 | 32 | 121 | 0.378 | 135.60 | 128.05 |
| ruthba01 | Babe Ruth | 10829 | 8884 | 2672 | 702 | 1902 | 0.301 | 43 | 0 | 0.426 | 127.29 | 120.44 |
| rodrial01 | Alex Rodriguez | 11917 | 10293 | 3047 | 547 | 1338 | 0.296 | 176 | 110 | 0.383 | 120.29 | 110.30 |
| musiast01 | Stan Musial | 13036 | 11420 | 3579 | 492 | 1510 | 0.313 | 53 | 53 | 0.394 | 119.51 | 113.03 |
| cobbty01 | Ty Cobb | 12721 | 11659 | 3726 | 247 | 971 | 0.320 | 91 | 0 | 0.376 | 114.48 | 108.77 |
| pujolal01 | Albert Pujols | 13201 | 11432 | 3522 | 662 | 1523 | 0.308 | 123 | 123 | 0.391 | 111.86 | 97.34 |
| schmimi01 | Mike Schmidt | 10310 | 8517 | 2331 | 561 | 1606 | 0.274 | 79 | 108 | 0.390 | 109.58 | 106.41 |
| henderi01 | Rickey Henderson | 13760 | 11331 | 3195 | 286 | 2264 | 0.282 | 98 | 67 | 0.404 | 109.08 | 103.90 |
| willite01 | Ted Williams | 10184 | 8258 | 2593 | 503 | 1867 | 0.314 | 39 | 20 | 0.442 | 107.86 | 107.75 |
| speaktr01 | Tris Speaker | 12003 | 10723 | 3172 | 195 | 1179 | 0.296 | 101 | 0 | 0.371 | 102.26 | 95.13 |
| morgajo02 | Joe Morgan | 11522 | 9444 | 2648 | 308 | 1942 | 0.280 | 40 | 96 | 0.402 | 100.17 | 96.07 |
| robinfr02 | Frank Robinson | 11945 | 10213 | 3001 | 535 | 1432 | 0.294 | 198 | 102 | 0.388 | 99.93 | 95.92 |
| ottme01 | Mel Ott | 11058 | 9370 | 2657 | 418 | 1624 | 0.284 | 64 | 0 | 0.393 | 99.74 | 95.72 |
| playerID | name | PA | AB | H | HR | BB | BA | HBP | SF | OBP | ebWAR | efWAR |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| bondsba01 | Barry Bonds | 12740 | 10170 | 3044 | 654 | 2373 | 0.299 | 106 | 91 | 0.434 | 153.89 | 145.24 |
| mayswi01 | Willie Mays | 12814 | 11151 | 3457 | 577 | 1528 | 0.310 | 44 | 91 | 0.392 | 144.08 | 135.39 |
| aaronha01 | Henry Aaron | 14113 | 12540 | 3923 | 689 | 1420 | 0.313 | 32 | 121 | 0.381 | 135.60 | 128.05 |
| ruthba01 | Babe Ruth | 10829 | 8884 | 2692 | 702 | 1902 | 0.303 | 43 | 0 | 0.428 | 127.29 | 120.44 |
| rodrial01 | Alex Rodriguez | 11917 | 10293 | 3039 | 547 | 1338 | 0.295 | 176 | 110 | 0.382 | 120.29 | 110.30 |
| musiast01 | Stan Musial | 13036 | 11420 | 3589 | 492 | 1510 | 0.314 | 53 | 53 | 0.395 | 119.51 | 113.03 |
| cobbty01 | Ty Cobb | 12721 | 11659 | 3876 | 247 | 971 | 0.332 | 91 | 0 | 0.388 | 114.48 | 108.77 |
| pujolal01 | Albert Pujols | 13201 | 11432 | 3510 | 662 | 1523 | 0.307 | 123 | 123 | 0.391 | 111.86 | 97.34 |
| schmimi01 | Mike Schmidt | 10310 | 8517 | 2339 | 561 | 1606 | 0.275 | 79 | 108 | 0.390 | 109.58 | 106.41 |
| henderi01 | Rickey Henderson | 13760 | 11331 | 3180 | 286 | 2264 | 0.281 | 98 | 67 | 0.403 | 109.08 | 103.90 |
| willite01 | Ted Williams | 10184 | 8258 | 2596 | 503 | 1867 | 0.314 | 39 | 20 | 0.442 | 107.86 | 107.75 |
| speaktr01 | Tris Speaker | 12003 | 10723 | 3223 | 195 | 1179 | 0.301 | 101 | 0 | 0.375 | 102.26 | 95.13 |
| morgajo02 | Joe Morgan | 11522 | 9444 | 2652 | 308 | 1942 | 0.281 | 40 | 96 | 0.402 | 100.17 | 96.07 |
| robinfr02 | Frank Robinson | 11945 | 10213 | 3010 | 535 | 1432 | 0.295 | 198 | 102 | 0.388 | 99.93 | 95.92 |
| ottme01 | Mel Ott | 11058 | 9370 | 2662 | 418 | 1624 | 0.284 | 64 | 0 | 0.393 | 99.74 | 95.72 |
We apply our Full House Model to the bWAR, fWAR, SO, ERA for pitchers using the non-parametric distribution to measure the components.
The career results for pitchers with top 15 bWAR pitching leaders.| playerID | name | IP | ER | ERA | K | ebWAR | efWAR |
|---|---|---|---|---|---|---|---|
| clemero02 | Roger Clemens | 5456 | 1701 | 2.81 | 4752 | 145.88 | 141.25 |
| maddugr01 | Greg Maddux | 5646 | 1739 | 2.77 | 3473 | 113.66 | 120.73 |
| johnsra05 | Randy Johnson | 4724 | 1521 | 2.90 | 5136 | 110.81 | 109.77 |
| seaveto01 | Tom Seaver | 4587 | 1482 | 2.91 | 3656 | 104.31 | 90.78 |
| grovele01 | Lefty Grove | 3518 | 1085 | 2.78 | 2826 | 102.54 | 98.80 |
| verlaju01 | Justin Verlander | 4107 | 1283 | 2.81 | 3297 | 100.23 | 95.07 |
| blylebe01 | Bert Blyleven | 4877 | 1721 | 3.18 | 3785 | 97.69 | 101.82 |
| niekrph01 | Phil Niekro | 5082 | 1776 | 3.15 | 3364 | 94.37 | 77.47 |
| kershcl01 | Clayton Kershaw | 3484 | 941 | 2.43 | 2980 | 93.78 | 88.83 |
| johnswa01 | Walter Johnson | 4791 | 1766 | 3.32 | 3888 | 91.53 | 91.80 |
| spahnwa01 | Warren Spahn | 5112 | 1832 | 3.23 | 2955 | 91.20 | 72.30 |
| scherma01 | Max Scherzer | 3658 | 1150 | 2.83 | 3506 | 90.63 | 82.14 |
| greinza01 | Zack Greinke | 4278 | 1433 | 3.01 | 2916 | 90.23 | 80.28 |
| perryga01 | Gaylord Perry | 4977 | 1768 | 3.20 | 3366 | 89.50 | 94.45 |
| carltst01 | Steve Carlton | 4816 | 1641 | 3.07 | 4221 | 88.70 | 100.34 |
| rank | ebWAR | efWAR | bWAR | fWAR | ESPN | Hall of Stats |
|---|---|---|---|---|---|---|
| 1 | Barry Bonds | Barry Bonds | Babe Ruth | Babe Ruth | Babe Ruth | Babe Ruth |
| 2 | Roger Clemens | Roger Clemens | Walter Johnson | Barry Bonds | Willie Mays | Barry Bonds |
| 3 | Willie Mays | Willie Mays | Cy Young | Willie Mays | Hank Aaron | Walter Johnson |
| 4 | Babe Ruth | Henry Aaron | Barry Bonds | Ty Cobb | Ty Cobb | Willie Mays |
| 5 | Henry Aaron | Greg Maddux | Willie Mays | Honus Wagner | Ted Williams | Cy Young |
| 6 | Alex Rodriguez | Babe Ruth | Ty Cobb | Hank Aaron | Lou Gehrig | Ty Cobb |
| 7 | Stan Musial | Stan Musial | Hank Aaron | Roger Clemens | Mickey Mantle | Hank Aaron |
| 8 | Ty Cobb | Alex Rodriguez | Roger Clements | Cy Young | Barry Bonds | Roger Clemens |
| 9 | Greg Maddux | Randy Johnson | Tris Speaker | Tris Speaker | Walter Johnson | Rogers Hornsby |
| 10 | Albert Pujols | Ty Cobb | Honus Wagner | Ted Williams | Stan Musial | Houns Wagner |
| 11 | Randy Johnson | Nolan Ryan | Stan Musial | Rogers Hornsby | Pedro Martinez | Tris Speaker |
| 12 | Mike Schmidt | Ted Williams | Rogers Hornsby | Stan Musial | Honus Wagner | Ted Williams |
| 13 | Rickey Henderson | Mike Schmidt | Eddie Collins | Eddie Collins | Ken Griffey Jr. | Stan Musial |
| 14 | Ted Williams | Rickey Henderson | Ted Williams | Walter Johnson | Greg Maddux | Eddie Collins |
| 15 | Tom Seaver | Bert Blyleven | Pete Alexander | Greg Maddux | Mike Trout | Pete Alexander |
| 16 | Lefty Grove | Steve Carlton | Alex Rodrigues | Lou Gehrig | Joe DiMaggio | Alex Rodriguez |
| 17 | Tris Speaker | Lefty Grove | Kid Nichols | Alex Rodriguez | Roger Clemens | Lou Gehrig |
| 18 | Justin Verlander | Albert Pujols | Lou Gehrig | Mickey Mantle | Mike Schmidt | Mickey Mantle |
| 19 | Joe Morgan | Joe Morgan | Rickey Herderson | Mel Ott | Frank Robinson | Lefty Grove |
| 20 | Frank Robinson | Frank Robinson | Mel Ott | Randy Johnson | Rogeres Hornsby | Mel Ott |
| 21 | Mel Ott | Mel Ott | Mickey Mantle | Nolan Ryan | Cy Young | Rickey Henderson |
| 22 | Bert Blyleven | Tris Speaker | Tom Seaver | Mike Schmidt | Tom Seaver | Kid Nichols |
| 23 | Cal Ripken Jr | Justin Verlander | Frank Robinson | Rickey Henderson | Rickey Henderson | Mike Schmidt |
| 24 | Rogers Hornsby | Gaylord Perry | Nap Lajole | Frank Robinson | Randy Johnson | Nap Lajoie |
| 25 | Lou Gehrig | Rogers Hornsby | Mike Schmidt | Bert Blyleven | Christy Mathewson | Christy Mathewson |
| pre-1950 in top 10 | 3 | 3 | 6 | 6 | 6 | 6 |
| pre-1950 in top 25 | 9 | 8 | 15 | 12 | 11 | 17 |
| proportion before 1950 | 0.298 | 0.298 | 0.298 | 0.298 | 0.298 | 0.298 |
| chance in top 10 | 1 in 1.63 | 1 in 1.63 | 1 in 21.79 | 1 in 21.79 | 1 in 21.79 | 1 in 21.79 |
| chance in top 10 | 1 in 3.17 | 1 in 2.09 | 1 in 606.01 | 1 in 23.73 | 1 in 10.65 | 1 in 11126.97 |
| rank | Peak in Full House | Career in Full House | Schell Method 1 | Schell Method 2 | Era-bridging Method | Raw Career |
|---|---|---|---|---|---|---|
| 1 | Rod Carew | Tony Gwynn | Tony Gwynn | Tony Gwynn | Ty Cobb | Ty Cobb |
| 2 | Ichiro Suzuki | Rod Carew | Ty Cobb | Ty Cobb | Tony Gwynn | Rogers Hornsby |
| 3 | Jose Altuve | Jose Altuve | Rod Carew | Rod Carew | Ted Williams | Shoeless Joe Jackson |
| 4 | Albert Pujols | Ichiro Suzuki | Shoeless Joe Jackson | Rogers Hornsby | Wade Boggs | Lefty O’Doul |
| 5 | Joe Mauer | Miguel Cabrera | Rogers Hornsby | Stan Musial | Rod Carew | Ed Delahanty |
| 6 | Josh Hamilton | Roberto Clemente | Ted Williams | Nap Lajoie | Shoeless Jos Jackson | Tris Speaker |
| 7 | Miguel Cabrera | Ty Cobb | Honus Wagner | Shoeless Joe Jackson | Nap Lajoie | Bill Hamilton |
| 8 | Trea Turner | Joe DiMaggio | Stan Musial | Honus Wagner | Stan Musial | Ted Williams |
| 9 | Harry Walker | Wade Boggs | Wade Boggs | Ted Williams | Frank Thomas | Dan Brouthers |
| 10 | Jeff McNeil | Buster Posey | Nap Lajoie | Wade Boggs | Ed Delahanty | Babe Ruth |
| 11 | Tony Gwynn | Mike Trout | Tris Speaker | Pete Browning | Tris Speaker | Dave Orr |
| 12 | John Olerud | Freddie Freeman | Pete Browning | Tris Speaker | Rogers Hornsby | Harry Heilmann |
| 13 | José Reyes | Joe Mauer | Willie Mays | Mike Piazza | Hank Aaron | Pete Browning |
| 14 | Alex Rodriguez | Ted Williams | Dan Brouthers | Dan Brouthers | Alex Rodriguez | Willie Keeler |
| 15 | Keith Hernandez | Stan Musial | Kirby Puckett | Tip O’Neill | Pete Rose | Bill Terry |
| 16 | Mookie Betts | Willie Mays | Babe Ruth | Kirby Puckett | Honus Wagner | Lou Gehrig |
| 17 | Pete Rose | Bill Terry | Tip O’Neill | Tony Oliva | Roberto Clements | George Sisler |
| 18 | Dee Strange Gordon | Robinson CanĂ³ | Willie Keeler | Vladimir Guerrero | George Brett | Jesse Burkett |
| 19 | Edgar Martinez | Henry Aaron | Joe DiMaggio | Mike Donlin | Don Mattingsly | Tony Gwynn |
| 20 | Stan Musial | Matty Alou | Tony Oliva | Willie Keeler | Kirby Puckett | Nap Lajoie |
| 21 | Ken Griffey | Vladimir Guerrero | Jesse Burkett | Edgar Martinez | Mike Piazza | Jake Stenzel |
| 22 | Willie McGee | Derek Jeter | Eddie Collins | Henry Aaron | Eddie Collins | Riggs Stephenson |
| 23 | Luis Arraez | Al Oliver | George Sisler | Derek Aaron | Edgar Martinez | Al Simmons |
| 24 | Robin Yount | Lou Gehrig | Lou Gehrig | Joe DiMaggio | Paul Molitor | Cap Anson |
| 25 | Derrek Lee | Edgar Martinez | Don Mattingly | Babe Ruth | Willie Mays | John McGraw |
| pre-1950 in top 10 | 1 | 2 | 7 | 7 | 6 | 10 |
| pre-1950 in top 25 | 2 | 6 | 18 | 15 | 10 | 24 |
| proportion before 1950 | 0.219 | 0.259 | 0.425 | 0.345 | 0.319 | 0.259 |
| MLB-eligible population | 1871-2023 | 1871-2012 | 1876-1984 | 1876-1997 | 1901-1996 | 1871-2012 |
| chance in top 10 | 1 in 1.09 | 1 in 1.29 | 1 in 13.2 | 1 in 41.72 | 1 in 15.92 | 1 in 736211.1 |
| chance in top 10 | 1 in 1.02 | 1 in 1.51 | 1 in 363.72 | 1 in 124.55 | 1 in 3.97 | 1 in 6415384084573.36 |
| rank | Peak in Full House | Career in Full House | Era-bridging Method | PPS detrending method | Peak in Schell | Career in Schell | Raw AB per HR |
|---|---|---|---|---|---|---|---|
| 1 | Babe Ruth | Babe Ruth | Mark McGwire | Babe Ruth | Barry Bonds | Babe Ruth | Mark McGwire |
| 2 | Willie Stargell | Mark McGwire | Juan Gonzalez | Mel Ott | Babe Ruth | Mark McGwire | Babe Ruth |
| 3 | Willie Mays | Giancarlo Stanton | Babe Ruth | Lou Gehrig | Mark McGwire | Ted Williams | Barry Bonds |
| 4 | Aaron Judge | Dave Kingman | Dave Kingman | Jimmie Foxx | Buck Frecman | Barry Bonds | Jim Thome |
| 5 | Giancarlo Stanton | Ralph Kiner | Mike Schmidt | Hank Aaron | Ed Delahanty | Mike Schmidt | Ralph Kiner |
| 6 | José Bautista | Mike Schmidt | Harmon Killebrew | Rogers Hornsby | Tim Jordan | Lou Gehrig | Harmon Killebrew |
| 7 | Mark McGwire | Willie Stargell | Frank Thomas | Cy Williams | Willie Stargell | Harmon Killebrew | Sammy Sosa |
| 8 | Chris Davis | Barry Bonds | Jose Canseco | Barry Bonds | Rogers Hornsby | Jimmie Foxx | Ted Williams |
| 9 | Luke Voit | Jimmie Foxx | Ron Kittle | Willie Mays | Jim Thome | Dave Kingman | Manny Ramirez |
| 10 | Ted Williams | Mike Trout | Willie Stargell | Ted Williams | Dave Kingman | Reggic Jackson | Adam Dunn |
| 11 | Eddie Mathews | Ted Williams | Willie McCovey | Reggie Jackson | Roy Sievers | Bill Nicholson | Ryan Howard |
| 12 | Khris Davis | David Ortiz | Darryl Strawberry | Mike Schmidt | Jeff Bagwell | Mickey Mantle | Juan Gonzalez |
| 13 | Bryce Harper | Willie McCovey | Bo Jackson | Frank Robinson | Ted Williams | Ralph Kiner | Dave Kingman |
| 14 | Mike Schmidt | Harmon Killebrew | Ted Williams | Harmon Killebrew | Kevin Mitchell | Joe DiMaggio | Russell Branyan |
| 15 | David Ortiz | Mickey Mantle | Ralph Kiner | Gavvy Cravath | Mike Schmidt | Willie Stargell | Mickey Mantle |
| 16 | Kevin Mitchell | Hank Greenberg | Pat Seerey | Honus Wagner | Lou Gehrig | Hack Wilson | Alex Rodriguez |
| 17 | Albert Pujols | Darryl Strawberry | Reggie Jackson | Willie McCovey | Fred Dunlap | Rogers Hornsby | Jimmie Foxx |
| 18 | Mickey Mantle | Jose Canseco | Ken Griffey | Harry Stovey | Harry Stovey | Darryl Strawberry | Mike Schmidt |
| 19 | Dave Kingman | Lou Gehrig | Albert Belle | Ken Griffey Jr. | Charlie Hickman | Willie McCovey | Jose Canseco |
| 20 | Gorman Thomas | Jim Thome | Dick Allen | Stan Musial | Bill Nicholson | Glenn Davis | Albert Belle |
| 21 | George Foster | Eddie Mathews | Barry Bonds | Willie Stargell | Boog Powell | Wally Berger | Khris Davis |
| 22 | Johnny Bench | Reggie Jackson | Dean Palmer | Eddie Murray | Joe DiMaggio | Eddie Mathews | Ron Kittle |
| 23 | Darrell Evans | Ryan Howard | Hank Aaron | Mark McGwire | Eddie Mathews | Harry Stovey | Carlos Delgado |
| 24 | Andruw Jones | Albert Pujols | Jimmie Foxx | Mickey Mantle | Mickey Mantle | Frank Howard | Ken Griffey Jr. |
| 25 | Reggie Jackson | Hank Sauer | Mike Piazza | Al Simmons | Tris Speaker | Mel Ott | Hank Greenberg |
| pre-1950 in top 10 | 2 | 3 | 1 | 7 | 5 | 4 | 3 |
| pre-1950 in top 25 | 2 | 7 | 5 | 12 | 13 | 12 | 5 |
| proportion before 1950 | 0.219 | 0.259 | 0.33 | 0.368 | 0.309 | 0.396 | 0.292 |
| MLB-eligible population | 1871-2023 | 1871-2012 | 1901-1996 | 1871-1993 | 1876-2003 | 1876-1988 | 1871-2006 |
| chance in top 10 | 1 in 1.47 | 1 in 1.99 | 1 in 1.02 | 1 in 28.94 | 1 in 6.02 | 1 in 1.65 | 1 in 1.68 |
| chance in top 10 | 1 in 1.02 | 1 in 2.08 | 1 in 1.05 | 1 in 5.89 | 1 in 44.59 | 1 in 3.93 | 1 in 1.01 |
| rank | Career Full House | Schell Method | Raw Career |
|---|---|---|---|
| 1 | Ted Williams | Ted Williams | Ted Williams |
| 2 | Mike Trout | Babe Ruth | Babe Ruth |
| 3 | Barry Bonds | Rogers Hornsby | John McGraw |
| 4 | Joey Votto | Barry Bonds | Billy Hamilton |
| 5 | Babe Ruth | John McGraw | Lou Gehrig |
| 6 | Mickey Mantle | Billy Hamilton | Barry Bonds |
| 7 | Bryce Harper | Topsy Hartsel | Bill Joyce |
| 8 | Lou Gehrig | Mel Ott | Jud Wilson |
| 9 | Frank Thomas | Roy Thomas | Rogers Hornsby |
| 10 | Freddie Freeman | Mickey Mantle | Ty Cobb |
| 11 | Edgar Martinez | Wade Boggs | Jimmie Foxx |
| 12 | Lance Berkman | Frank Thomas | Tris Speaker |
| 13 | Paul Goldschmidt | Lou Gehrig | Eddie Collins |
| 14 | Wade Boggs | Rickey Henderson | Ferris Fain |
| 15 | Rickey Henderson | Stan Musial | Dan Brouthers |
| 16 | Jason Giambi | Edgar Martinez | Max Bishop |
| 17 | Joe Mauer | Ty Cobb | Shoeless Joe Jackson |
| 18 | Miguel Cabrera | Dan Brouthers | Mickey Mantle |
| 19 | Joe Morgan | Tris Speaker | Mickey Cochrane |
| 20 | Prince Fielder | Joe Cunningham | Frank Thomas |
| 21 | Brian Giles | George Gore | Edgar Martinez |
| 22 | Mike Hargrove | Eddie Collins | Turkey Stearnes |
| 23 | Manny Ramirez | Ross Youngs | Stan Musial |
| 24 | Jeff Bagwell | Mike Hargrove | Cupid Childs |
| 25 | Jim Thome | Jeff Bagwell | Wade Boggs |
| pre-1950 in top 10 | 3 | 8 | 8 |
| pre-1950 in top 25 | 3 | 16 | 19 |
| proportion before 1950 | 0.25 | 0.354 | 0.25 |
| MLB-eligible population | 1871-2012 | 1876-1993 | 1871-2012 |
| chance in top 10 | 1 in 2.11 | 1 in 191.34 | 1 in 2404.99 |
| chance in top 10 | 1 in 1.03 | 1 in 298.15 | 1 in 7867593.12 |
| p_value | prop_season |
|---|---|
| 0.05 | 0.15 |
| 0.10 | 0.24 |
| 0.20 | 0.34 |
| 0.30 | 0.39 |
| rank | name | BA | name | OBP | name | HR |
|---|---|---|---|---|---|---|
| 1 | Tony Gwynn | 0.342 | Ted Williams | 0.442 | Babe Ruth | 702 |
| 2 | Rod Carew | 0.329 | Mike Trout | 0.438 | Henry Aaron | 689 |
| 3 | Jose Altuve | 0.327 | Barry Bonds | 0.434 | Albert Pujols | 662 |
| 4 | Ichiro Suzuki | 0.327 | Joey Votto | 0.433 | Barry Bonds | 654 |
| 5 | Miguel Cabrera | 0.32 | Babe Ruth | 0.426 | Reggie Jackson | 578 |
| 6 | Roberto Clemente | 0.32 | Mickey Mantle | 0.42 | Willie Mays | 577 |
| 7 | Ty Cobb | 0.32 | Bryce Harper | 0.417 | Mike Schmidt | 561 |
| 8 | Joe DiMaggio | 0.318 | Lou Gehrig | 0.415 | Alex Rodriguez | 547 |
| 9 | Wade Boggs | 0.316 | Frank Thomas | 0.411 | Frank Robinson | 535 |
| 10 | Buster Posey | 0.316 | Freddie Freeman | 0.41 | Ken Griffey Jr | 528 |
| 11 | Mike Trout | 0.315 | Edgar Martinez | 0.41 | Willie Stargell | 528 |
| 12 | Freddie Freeman | 0.314 | Lance Berkman | 0.407 | David Ortiz | 521 |
| 13 | Joe Mauer | 0.314 | Paul Goldschmidt | 0.407 | Willie McCovey | 515 |
| 14 | Ted Williams | 0.314 | Wade Boggs | 0.405 | Harmon Killebrew | 508 |
| 15 | Stan Musial | 0.313 | Christian Yelich | 0.405 | Ted Williams | 503 |
| 16 | Willie Mays | 0.312 | Rickey Henderson | 0.404 | Mickey Mantle | 502 |
| 17 | Bill Terry | 0.312 | Jason Giambi | 0.403 | Eddie Mathews | 502 |
| 18 | Robinson CanĂ³ | 0.311 | Joe Mauer | 0.403 | Eddie Murray | 498 |
| 19 | Henry Aaron | 0.31 | Miguel Cabrera | 0.402 | Jimmie Foxx | 493 |
| 20 | Matty Alou | 0.31 | Joe Morgan | 0.402 | Stan Musial | 492 |
| 21 | Vladimir Guerrero | 0.31 | Prince Fielder | 0.4 | Dave Winfield | 491 |
| 22 | Derek Jeter | 0.31 | Brian Giles | 0.4 | Mark McGwire | 489 |
| 23 | Al Oliver | 0.31 | Mike Hargrove | 0.4 | Jim Thome | 484 |
| 24 | Lou Gehrig | 0.309 | Manny Ramirez | 0.4 | Miguel Cabrera | 480 |
| 25 | Edgar Martinez | 0.309 | Jeff Bagwell | 0.399 | Lou Gehrig | 479 |
| pre-1950 in top 10 | 2 | 3 | 1 | |||
| pre-1950 in top 25 | 6 | 3 | 5 | |||
| proportion before 1950 | 0.259 | 0.259 | 0.259 | |||
| MLB-eligible population | 1871-2012 | 1871-2012 | 1871-2012 | |||
| chance in top 10 | 1 in 1.29 | 1 in 1.99 | 1 in 1.05 | |||
| chance in top 10 | 1 in 1.51 | 1 in 1.03 | 1 in 1.23 |
| rank | name | ebWAR | name | efWAR |
|---|---|---|---|---|
| 1 | Barry Bonds | 153.89 | Barry Bonds | 145.24 |
| 2 | Willie Mays | 144.08 | Willie Mays | 135.39 |
| 3 | Henry Aaron | 135.6 | Henry Aaron | 128.05 |
| 4 | Babe Ruth | 127.29 | Babe Ruth | 120.44 |
| 5 | Alex Rodriguez | 120.29 | Stan Musial | 113.03 |
| 6 | Stan Musial | 119.51 | Alex Rodriguez | 110.3 |
| 7 | Ty Cobb | 114.48 | Ty Cobb | 108.77 |
| 8 | Albert Pujols | 111.86 | Ted Williams | 107.75 |
| 9 | Mike Schmidt | 109.58 | Mike Schmidt | 106.41 |
| 10 | Rickey Henderson | 109.08 | Rickey Henderson | 103.9 |
| 11 | Ted Williams | 107.86 | Albert Pujols | 97.34 |
| 12 | Tris Speaker | 102.26 | Joe Morgan | 96.07 |
| 13 | Joe Morgan | 100.17 | Frank Robinson | 95.92 |
| 14 | Frank Robinson | 99.93 | Mel Ott | 95.72 |
| 15 | Mel Ott | 99.74 | Tris Speaker | 95.13 |
| 16 | Cal Ripken Jr | 97.39 | Rogers Hornsby | 94.42 |
| 17 | Rogers Hornsby | 97.01 | Mickey Mantle | 94.3 |
| 18 | Lou Gehrig | 95.87 | Cal Ripken Jr | 93.24 |
| 19 | Mickey Mantle | 95.37 | Lou Gehrig | 92.98 |
| 20 | Carl Yastrzemski | 95.2 | Carl Yastrzemski | 92.64 |
| 21 | AdriĂ¡n BeltrĂ© | 95.01 | Honus Wagner | 89.81 |
| 22 | Wade Boggs | 92.51 | Wade Boggs | 87.91 |
| 23 | Roberto Clemente | 91.37 | Mike Trout | 87.87 |
| 24 | Eddie Collins | 90.94 | Eddie Mathews | 86.38 |
| 25 | Mike Trout | 90.43 | AdriĂ¡n BeltrĂ© | 86.34 |
| pre-1950 in top 10 | 3 | 4 | ||
| pre-1950 in top 25 | 9 | 9 | ||
| proportion before 1950 | 0.263 | 0.263 | ||
| chance in top 10 | 1 in 1.95 | 1 in 3.92 | ||
| chance in top 10 | 1 in 5.31 | 1 in 5.31 |
| rank | name | IP | name | ERA | name | SO |
|---|---|---|---|---|---|---|
| 1 | Greg Maddux | 5646 | Clayton Kershaw | 2.43 | Nolan Ryan | 6026 |
| 2 | Roger Clemens | 5456 | Pedro Martinez | 2.61 | Randy Johnson | 5136 |
| 3 | Nolan Ryan | 5319 | Greg Maddux | 2.77 | Roger Clemens | 4752 |
| 4 | Warren Spahn | 5112 | Lefty Grove | 2.78 | Steve Carlton | 4221 |
| 5 | Phil Niekro | 5082 | Roger Clemens | 2.81 | Walter Johnson | 3888 |
| 6 | Don Sutton | 5071 | Justin Verlander | 2.81 | Bert Blyleven | 3785 |
| 7 | Gaylord Perry | 4977 | Max Scherzer | 2.83 | Tom Seaver | 3656 |
| 8 | Tom Glavine | 4917 | Roy Halladay | 2.85 | Don Sutton | 3575 |
| 9 | Bert Blyleven | 4877 | Randy Johnson | 2.9 | Max Scherzer | 3506 |
| 10 | Steve Carlton | 4816 | Tom Seaver | 2.91 | Greg Maddux | 3473 |
| 11 | Walter Johnson | 4791 | Cole Hamels | 2.94 | Gaylord Perry | 3366 |
| 12 | Randy Johnson | 4724 | Carl Hubbell | 2.94 | Phil Niekro | 3364 |
| 13 | Cy Young | 4663 | Whitey Ford | 2.96 | Justin Verlander | 3297 |
| 14 | Tom Seaver | 4587 | Curt Schilling | 2.96 | Pedro Martinez | 3113 |
| 15 | Tommy John | 4585 | John Smoltz | 2.96 | John Smoltz | 3106 |
| 16 | Jamie Moyer | 4585 | Bob Gibson | 2.97 | Bob Feller | 3104 |
| 17 | Robin Roberts | 4435 | Jim Palmer | 2.97 | Fergie Jenkins | 3088 |
| 18 | Pete Alexander | 4356 | Zack Greinke | 3.01 | Curt Schilling | 3036 |
| 19 | Zack Greinke | 4278 | Tim Hudson | 3.03 | Clayton Kershaw | 2980 |
| 20 | Early Wynn | 4277 | Juan Marichal | 3.05 | CC Sabathia | 2960 |
| 21 | Fergie Jenkins | 4199 | Steve Carlton | 3.07 | Warren Spahn | 2955 |
| 22 | Dennis Martinez | 4182 | Tom Glavine | 3.1 | Zack Greinke | 2916 |
| 23 | CC Sabathia | 4169 | FĂ©lix HernĂ¡ndez | 3.1 | Frank Tanana | 2849 |
| 24 | Justin Verlander | 4107 | Kevin Brown | 3.12 | Bob Gibson | 2836 |
| 25 | Jack Morris | 4039 | Adam Wainwright | 3.12 | Lefty Grove | 2826 |
| pre-1950 in top 10 | 1 | 1 | 1 | |||
| pre-1950 in top 25 | 6 | 2 | 4 | |||
| proportion before 1950 | 0.298 | 0.28 | 0.28 | |||
| MLB-eligible population | 1871-2006 | 1871-2012 | 1871-2006 | |||
| chance in top 10 | 1 in 1.03 | 1 in 1.04 | 1 in 1.04 | |||
| chance in top 10 | 1 in 1.25 | 1 in 1 | 1 in 1.05 |
| rank | name | ebWAR | name | efWAR |
|---|---|---|---|---|
| 1 | Roger Clemens | 145.88 | Roger Clemens | 141.25 |
| 2 | Greg Maddux | 113.66 | Greg Maddux | 120.73 |
| 3 | Randy Johnson | 110.81 | Randy Johnson | 109.77 |
| 4 | Tom Seaver | 104.31 | Nolan Ryan | 108.3 |
| 5 | Lefty Grove | 102.54 | Bert Blyleven | 101.82 |
| 6 | Justin Verlander | 100.23 | Steve Carlton | 100.34 |
| 7 | Bert Blyleven | 97.69 | Lefty Grove | 98.8 |
| 8 | Phil Niekro | 94.37 | Justin Verlander | 95.07 |
| 9 | Clayton Kershaw | 93.78 | Gaylord Perry | 94.45 |
| 10 | Walter Johnson | 91.53 | Walter Johnson | 91.8 |
| 11 | Warren Spahn | 91.2 | Cy Young | 91.28 |
| 12 | Max Scherzer | 90.63 | Tom Seaver | 90.78 |
| 13 | Zack Greinke | 90.23 | Clayton Kershaw | 88.83 |
| 14 | Gaylord Perry | 89.5 | Don Sutton | 82.98 |
| 15 | Steve Carlton | 88.7 | Max Scherzer | 82.14 |
| 16 | Pedro Martinez | 87.2 | Pedro Martinez | 82.13 |
| 17 | Nolan Ryan | 86.85 | Zack Greinke | 80.28 |
| 18 | Mike Mussina | 84.62 | Mike Mussina | 80.08 |
| 19 | Curt Schilling | 82.09 | John Smoltz | 79.36 |
| 20 | Tom Glavine | 81.89 | Pete Alexander | 77.55 |
| 21 | Robin Roberts | 79.01 | Phil Niekro | 77.47 |
| 22 | Fergie Jenkins | 77.83 | Curt Schilling | 77 |
| 23 | Bob Gibson | 77 | Bob Gibson | 76.45 |
| 24 | Roy Halladay | 76.07 | Fergie Jenkins | 75.97 |
| 25 | CC Sabathia | 74.5 | Tommy John | 75.8 |
| pre-1950 in top 10 | 2 | 2 | ||
| pre-1950 in top 25 | 4 | 4 | ||
| proportion before 1950 | 0.28 | 0.28 | ||
| chance in top 10 | 1 in 1.22 | 1 in 1.22 | ||
| chance in top 10 | 1 in 1.05 | 1 in 1.05 |
| name | ebWAR | name | ebJAWS | name | efWAR | name | efJAWS |
|---|---|---|---|---|---|---|---|
| Barry Bonds | 153.89 | Barry Bonds | 109.14 | Barry Bonds | 145.24 | Roger Clemens | 103.54 |
| Roger Clemens | 145.88 | Roger Clemens | 107.47 | Roger Clemens | 141.25 | Barry Bonds | 103.21 |
| Willie Mays | 144.08 | Willie Mays | 105.16 | Willie Mays | 135.39 | Willie Mays | 98.72 |
| Babe Ruth | 137.98 | Babe Ruth | 100.70 | Henry Aaron | 128.05 | Babe Ruth | 90.37 |
| Henry Aaron | 135.60 | Henry Aaron | 95.11 | Greg Maddux | 120.73 | Henry Aaron | 89.52 |
| Alex Rodriguez | 120.29 | Alex Rodriguez | 91.66 | Babe Ruth | 120.28 | Greg Maddux | 88.51 |
| Stan Musial | 119.51 | Stan Musial | 88.38 | Stan Musial | 113.03 | Randy Johnson | 85.78 |
| Ty Cobb | 114.48 | Randy Johnson | 88.20 | Alex Rodriguez | 110.30 | Alex Rodriguez | 84.35 |
| Greg Maddux | 113.66 | Albert Pujols | 86.33 | Randy Johnson | 109.77 | Stan Musial | 83.61 |
| Albert Pujols | 111.86 | Greg Maddux | 85.60 | Ty Cobb | 108.77 | Ted Williams | 82.72 |
| Randy Johnson | 110.81 | Mike Schmidt | 84.52 | Nolan Ryan | 108.30 | Mike Schmidt | 82.20 |
| Mike Schmidt | 109.58 | Lefty Grove | 84.52 | Ted Williams | 107.75 | Lefty Grove | 79.31 |
| Rickey Henderson | 109.08 | Ted Williams | 83.54 | Mike Schmidt | 106.41 | Ty Cobb | 78.85 |
| Ted Williams | 107.86 | Ty Cobb | 82.62 | Rickey Henderson | 103.90 | Rickey Henderson | 78.83 |
| Tom Seaver | 104.31 | Rickey Henderson | 82.14 | Bert Blyleven | 101.82 | Steve Carlton | 78.32 |
| Lefty Grove | 102.54 | Justin Verlander | 80.34 | Steve Carlton | 100.34 | Nolan Ryan | 76.89 |
| Tris Speaker | 102.26 | Joe Morgan | 79.03 | Lefty Grove | 98.80 | Albert Pujols | 75.80 |
| Justin Verlander | 100.23 | Tom Seaver | 78.51 | Albert Pujols | 97.34 | Justin Verlander | 75.29 |
| Joe Morgan | 100.17 | Cal Ripken Jr | 77.54 | Joe Morgan | 96.07 | Joe Morgan | 75.14 |
| Frank Robinson | 99.93 | Mike Trout | 77.26 | Frank Robinson | 95.92 | Bert Blyleven | 75.12 |
| Mel Ott | 99.74 | Rogers Hornsby | 76.61 | Mel Ott | 95.72 | Rogers Hornsby | 74.27 |
| Bert Blyleven | 97.69 | Lou Gehrig | 75.70 | Tris Speaker | 95.13 | Mike Trout | 73.90 |
| Cal Ripken Jr | 97.39 | Wade Boggs | 75.61 | Justin Verlander | 95.07 | Mickey Mantle | 73.71 |
| Rogers Hornsby | 97.01 | Clayton Kershaw | 75.12 | Gaylord Perry | 94.45 | Cal Ripken Jr | 73.62 |
| Lou Gehrig | 95.87 | Mickey Mantle | 75.09 | Rogers Hornsby | 94.42 | Lou Gehrig | 73.41 |
| name | year | BA | name | year | ABpHR |
|---|---|---|---|---|---|
| Jose Altuve | 2014-2017 | 0.367 | Barry Bonds | 2001-2004 | 10.86 |
| Tony Gwynn | 1994-1997 | 0.366 | Mark McGwire | 1995-1998 | 11.15 |
| Rod Carew | 1974-1977 | 0.363 | Babe Ruth | 1918-1921 | 11.35 |
| Miguel Cabrera | 2010-2013 | 0.355 | Giancarlo Stanton | 2014-2017 | 11.85 |
| Wade Boggs | 1985-1988 | 0.353 | Albert Pujols | 2008-2011 | 12.20 |
| Ichiro Suzuki | 2001-2004 | 0.353 | Eddie Mathews | 1953-1956 | 12.34 |
| Barry Bonds | 2001-2004 | 0.352 | Willie Stargell | 1970-1973 | 12.45 |
| Joe Mauer | 2006-2009 | 0.350 | Jose Canseco | 1988-1991 | 12.46 |
| Roberto Clemente | 1964-1967 | 0.345 | Mike Schmidt | 1980-1983 | 12.57 |
| Joe DiMaggio | 1938-1941 | 0.345 | José Bautista | 2010-2013 | 12.68 |
| Albert Pujols | 2003-2006 | 0.343 | Gorman Thomas | 1978-1981 | 12.80 |
| Don Mattingly | 1984-1987 | 0.341 | Ralph Kiner | 1949-1952 | 12.87 |
| Mike Piazza | 1995-1998 | 0.341 | Khris Davis | 2015-2018 | 12.89 |
| Willie Mays | 1957-1960 | 0.340 | Aaron Judge | 2020-2023 | 13.25 |
| Matty Alou | 1966-1969 | 0.339 | Ted Williams | 1944-1947 | 13.26 |
| Tim Anderson | 2019-2022 | 0.338 | Sammy Sosa | 1998-2001 | 13.55 |
| Stan Musial | 1943-1946 | 0.338 | Frank Howard | 1967-1970 | 13.56 |
| Rogers Hornsby | 1922-1925 | 0.335 | Mickey Mantle | 1960-1963 | 13.66 |
| Ted Williams | 1943-1946 | 0.335 | David Ortiz | 2012-2015 | 13.71 |
| Ty Cobb | 1912-1915 | 0.334 | Nelson Cruz | 2017-2020 | 13.88 |
| Trea Turner | 2019-2022 | 0.334 | Jimmie Foxx | 1937-1940 | 13.91 |
| Henry Aaron | 1956-1959 | 0.333 | Carlos Pena | 2007-2010 | 13.91 |
| Cecil Cooper | 1980-1983 | 0.333 | Dave Kingman | 1976-1979 | 13.97 |
| Freddie Freeman | 2020-2023 | 0.333 | Darryl Strawberry | 1985-1988 | 13.98 |
| Nap Lajoie | 1901-1904 | 0.333 | Jim Thome | 2001-2004 | 14.01 |
| name | year | ABpHR | diff | ystar | balance | u | n | p_beta | pop | HR_talent |
|---|---|---|---|---|---|---|---|---|---|---|
| Babe Ruth | 1919 | 10.95 | 0.0662271 | 0.0010814 | 0.9683747 | 0.9997320 | 118 | 0.9688654 | 2020530 | 5355122 |
| Babe Ruth | 1920 | 10.88 | 0.0508460 | 0.0003966 | 0.9846381 | 0.9998818 | 130 | 0.9847546 | 2517182 | 12061976 |
| Babe Ruth | 1926 | 10.83 | 0.0412598 | 0.0007058 | 0.9669191 | 0.9997455 | 130 | 0.9674563 | 3499956 | 8272038 |
| Barry Bonds | 2001 | 10.86 | 0.0661758 | 0.0014000 | 0.9594065 | 0.9998329 | 243 | 0.9602162 | 11200119 | 18901295 |
| Barry Bonds | 2002 | 10.83 | 0.0259097 | 0.0013214 | 0.9074378 | 0.9996206 | 244 | 0.9115765 | 11725596 | 9660871 |
| Barry Bonds | 2004 | 10.76 | 0.0373425 | 0.0016128 | 0.9204885 | 0.9996714 | 242 | 0.9235554 | 12829045 | 11901386 |
The year effect for bWAR from the talent perspective and the era-adjusted bWAR perspective will be covered in this section.
The two figures below show that the bWAR and fWAR talent of the replacement-level batters from 1871 to 2023 The line in the plot is the smoothed line after apply natural cubic spline method. The changing pattern is the similar for the talent of the replacement-level batters and MLB-eligible population. The seasons that deviate from the smoothed line are those associated with strikes and World War II, such as 1943 - 1946, 1981, 1994, 1995 and 2020 seasons.
In this section we calculate the bWAR talent of a hypothetical 2023 hitter with 0 bWAR. Then, we compute his era-adjusted bWAR by mapping this hypothetical player to the other seasons from 1871 through 2023.
The figure above shows the era-adjusted bWAR values over time corresponding to a hypothetical batter with 0 bWAR in 2023 using the Full House Model. The fall in the mid-2000s corresponds to increase of MLB-eligible population. Despite the sharp decline in era-adjusted bWAR values in 1981 season, it is untrue that the 1981 batters with 0 bWAR would perform better than the 2023 batters. This is because the 1981 season was brief and a large number of batters underperformed replacement-level players. The problem of the sharp decline in era-adjusted bWAR values in the 1981 season is resolved when we look at the era-adjusted bWAR and era-adjusted bWAR per game of the hypothetical hitters from 2023 with 2 bWAR.
The two figures below shows the era-adjusted bWAR and era-adjusted bWAR per game of the hypothetical hitters from 2023 with 2 bWAR.
Then we also calculate the fWAR talent of a hypothetical 2023 hitter with 0 fWAR. Then, we compute his era-adjusted fWAR by mapping this hypothetical player to the other seasons from 1871 through 2023.
The figure above shows the era-adjusted fWAR values over time corresponding to a hypothetical batter with 0 fWAR in 2023 using the Full House Model. The fall in the mid-2000s corresponds to increase of MLB-eligible population. The problem of the sharp decline in era-adjusted fWAR values in the 1981 season can also be resolved when we look at the era-adjusted fWAR and era-adjusted fWAR per game of the hypothetical hitters from 2023 with 2 fWAR.
The two figures below show he era-adjusted fWAR and era-adjusted fWAR per game of the hypothetical hitters from 2023 with 2 fWAR.
It is actually supposed that the Major League’s expansion effect has a significant impact on how the talent scores vary over time and the magnitude of talent will be diluted with the expansion. However, after adding some hypothetical players who had poor performance in the early seasons, when there were fewer players, we discover that the real players did not much benefit from the adjustment of adding hypothetical players. Additionally, we make an effort to establish replacement player baselines throughout all seasons and adjust these baselines at the same level. However, the baselines we construct for every season are quite similar, and the talents of the players from earlier eras do not get much improved.
We also perform several simulations to examine the effect of season size. Using the same talent-generating process, we randomly generate different numbers of components from the same distribution and compare the top 1 talent score, top 50 talent scores, top 100 talent scores and top 300 talent scores. In each simulation, we generate two groups of components from the standard normal distribution, one consists 600 components and the other consists 300 components. Then we record the number of largest talents that group 1 is larger than the group 2. We run this simulation 1000 times and diplay the distribution of the result from each simulation. These graphs below show that top talent scores for various amounts of components from the same distribution are identical.
We also run another simulation to test the effect of season size. Instead of generating 600 components in the group 1, I generate 900 components and repeat the rest of the simulation. Then we have the similar results.
In this part, we compare the BA before and in the expansion seasons and see if there is a significant difference between them. The expansion seasons are collected from Wikipedia and they are 1879, 1892, 1900, 1901, 1961, 1962, 1969, 1977, 1993, and 1998. We compute the average and standard deviation of full-time batters’ BA before and in the expansion seasons and the results are shown below.
| yearID | mean | sd | yearID | mean | sd |
|---|---|---|---|---|---|
| 1878 | 0.281 | 0.043 | 1879 | 0.270 | 0.041 |
| 1891 | 0.264 | 0.027 | 1892 | 0.259 | 0.029 |
| 1899 | 0.293 | 0.037 | 1900 | 0.291 | 0.034 |
| 1900 | 0.291 | 0.034 | 1901 | 0.284 | 0.039 |
| 1960 | 0.268 | 0.025 | 1961 | 0.272 | 0.029 |
| 1961 | 0.272 | 0.029 | 1962 | 0.271 | 0.026 |
| 1968 | 0.250 | 0.027 | 1969 | 0.261 | 0.029 |
| 1976 | 0.264 | 0.030 | 1977 | 0.273 | 0.028 |
| 1992 | 0.265 | 0.026 | 1993 | 0.276 | 0.028 |
| 1997 | 0.276 | 0.027 | 1998 | 0.276 | 0.027 |
The average and standard error of the difference between the full-time batters’ BA before and in the expansion seasons are shown below.
| avg | se |
|---|---|
| -9e-04 | 0.0024288 |
We also show Willie Mays’s seasonal bWAR per game in his 22 years MLB career from 1951 to 1973 season except the 1953 season. The red dots represent the seasons that the MLB were experiencing the expansion. The result show that the expansion did not have any significant positive or negative effects on Willie Mays’s prime season or the tail of his career.
We also show Randy Johnson’s seasonal bWAR per game in his 22 years MLB career from 1988 to 2009 season. The red dots represent the seasons that the MLB were experiencing the expansion. The result show that the expansion did not have any significant positive or negative effects on Randy Johnson’s prime season.
This figure below shows that four statistical moments of the batting average distribution from 1871 to 2021 season. Points in red correspond to seasons surrounding the peak of WWII (1941-1946).
| correct Pareto dist | incorrect Pareto dist | normal dist | folded normal dist | |
|---|---|---|---|---|
| beat or ties | 1 | 1 | 1.000 | 1 |
| strictly beat | 1 | 1 | 0.995 | 1 |
| name | year | zscore_BA | normality_BA | zscore_obs_BA | normality_obs_BA |
|---|---|---|---|---|---|
| Ty Cobb | 1911 | 3.425657 | 0.1625329 | 3.460512 | 0.0436831 |
| Tony Gwynn | 1997 | 3.822984 | 0.0167243 | 3.339753 | 0.1744079 |
Based on the MLB-eligible population computed from the supplementary materials, we are able to obtain the proportion of the MLB-eligible population before 1950 in different season spans. For example, we would like to calculate the proportion of the MLB-eligible population before 1950 from 1871 to 2006. Given the MLB-eligible population are evenly distribution from the age 20 to 29, the cumulative MLB-eligible males aged 20 to 29 from 1871 to 1879 is equal to 90% of the MLB-eligible population in 1871. Similarly, the cumulative MLB-eligible males aged 20 to 29 from 2000 to 2006 is equal to 60% of the MLB-eligible population in 2006. The code below shows how to compute the proportion of the MLB-eligible population before 1950 from 1871 to 2006.
MLBpops <- bat_dat %>% group_by(yearID) %>% summarise(pops = unique(pops)) %>% group_by(yearID) %>%
summarise(population = round(mean(pops), 2))
# MLBpops contains the MLB-eligible population from 1871 to 2006
n <- MLBpops[MLBpops$yearID %in% c(1871, seq(1880, 2000, 10), 2006),]
p <- n
p$population[1] <- n$population[1] /10*9
p$population[15] <- n$population[15] /10*6
o <- p %>% mutate(population = round(population, 2)) %>%
mutate(cpp = round(cumsum(p$population)/(sum(p$population)), 3))
kable(o[o$yearID == 1950,c(1,3)]) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| yearID | cpp |
|---|---|
| 1950 | 0.292 |
| name | ebWAR | name | efWAR |
|---|---|---|---|
| Barry Bonds | 153.89 | Barry Bonds | 145.24 |
| Roger Clemens | 145.88 | Roger Clemens | 141.25 |
| Willie Mays | 144.08 | Willie Mays | 135.39 |
| Babe Ruth | 137.98 | Henry Aaron | 128.05 |
| Henry Aaron | 135.60 | Greg Maddux | 120.73 |
| Alex Rodriguez | 120.29 | Babe Ruth | 120.28 |
| Stan Musial | 119.51 | Stan Musial | 113.03 |
| Ty Cobb | 114.48 | Alex Rodriguez | 110.30 |
| Greg Maddux | 113.66 | Randy Johnson | 109.77 |
| Albert Pujols | 111.86 | Ty Cobb | 108.77 |
| Randy Johnson | 110.81 | Nolan Ryan | 108.30 |
| Mike Schmidt | 109.58 | Ted Williams | 107.75 |
| Rickey Henderson | 109.08 | Mike Schmidt | 106.41 |
| Ted Williams | 107.86 | Rickey Henderson | 103.90 |
| Tom Seaver | 104.31 | Bert Blyleven | 101.82 |
| Lefty Grove | 102.54 | Steve Carlton | 100.34 |
| Tris Speaker | 102.26 | Lefty Grove | 98.80 |
| Justin Verlander | 100.23 | Albert Pujols | 97.34 |
| Joe Morgan | 100.17 | Joe Morgan | 96.07 |
| Frank Robinson | 99.93 | Frank Robinson | 95.92 |
| Mel Ott | 99.74 | Mel Ott | 95.72 |
| Bert Blyleven | 97.69 | Tris Speaker | 95.13 |
| Cal Ripken Jr | 97.39 | Justin Verlander | 95.07 |
| Rogers Hornsby | 97.01 | Gaylord Perry | 94.45 |
| Lou Gehrig | 95.87 | Rogers Hornsby | 94.42 |
In this section, we perform a sensitivity analysis for the condition where some talented potential baseball players fail to start their sports career in baseball. For example, people argue that Kyler Murray and Pat Mahomes are playing in the NFL, but they are also considered as the two of the most talented potential baseball players. Competition from other sports is fierce at the upper end of the talent pool where multiple sport opportunities are common.
In this sensitivity analysis, we assume the 10th, 20th, …, 100th talented potential baseball players fail to start their sports career in baseball, which indicates the player with 10th largest bWAR or fWAR is paired with 11th largest talent, the player with 20th largest bWAR or fWAR is paired with 22th largest talent, and so on. Then we mapped their talents into the common mapping environment we build before and compute the era-adjusted bWAR and fWAR. We perform this analysis for the seasons after 1950 season based on the effect of baseball integration. We also perform this analysis for the seasons after 1994 season based on the effect of MLB strike.
The tables below are the top 25 ebWAR and efWAR for batters and pitchers combined with respect to this sensitivity analysis.
| without rm | ebWAR | rm after 1950 | ebWAR | rm after 1994 | ebWAR |
|---|---|---|---|---|---|
| Barry Bonds | 153.89 | Barry Bonds | 153.89 | Barry Bonds | 153.89 |
| Roger Clemens | 145.88 | Roger Clemens | 145.88 | Roger Clemens | 145.88 |
| Willie Mays | 144.08 | Willie Mays | 144.08 | Willie Mays | 144.08 |
| Babe Ruth | 137.98 | Babe Ruth | 137.98 | Babe Ruth | 137.98 |
| Henry Aaron | 135.60 | Henry Aaron | 135.60 | Henry Aaron | 135.60 |
| Alex Rodriguez | 120.29 | Alex Rodriguez | 120.29 | Alex Rodriguez | 120.29 |
| Stan Musial | 119.51 | Stan Musial | 119.51 | Stan Musial | 119.51 |
| Ty Cobb | 114.48 | Ty Cobb | 114.48 | Ty Cobb | 114.48 |
| Greg Maddux | 113.66 | Greg Maddux | 113.66 | Greg Maddux | 113.66 |
| Albert Pujols | 111.86 | Albert Pujols | 111.86 | Albert Pujols | 111.86 |
| Randy Johnson | 110.81 | Randy Johnson | 110.81 | Randy Johnson | 110.81 |
| Mike Schmidt | 109.58 | Mike Schmidt | 109.58 | Mike Schmidt | 109.58 |
| Rickey Henderson | 109.08 | Rickey Henderson | 109.08 | Rickey Henderson | 109.08 |
| Ted Williams | 107.86 | Ted Williams | 107.86 | Ted Williams | 107.86 |
| Tom Seaver | 104.31 | Tom Seaver | 104.31 | Tom Seaver | 104.31 |
| Lefty Grove | 102.54 | Lefty Grove | 102.54 | Lefty Grove | 102.54 |
| Tris Speaker | 102.26 | Tris Speaker | 102.26 | Tris Speaker | 102.26 |
| Justin Verlander | 100.23 | Justin Verlander | 100.23 | Justin Verlander | 100.23 |
| Joe Morgan | 100.17 | Joe Morgan | 100.17 | Joe Morgan | 100.17 |
| Frank Robinson | 99.93 | Frank Robinson | 99.93 | Frank Robinson | 99.93 |
| Mel Ott | 99.74 | Mel Ott | 99.74 | Mel Ott | 99.74 |
| Bert Blyleven | 97.69 | Bert Blyleven | 97.69 | Bert Blyleven | 97.69 |
| Cal Ripken Jr | 97.39 | Cal Ripken Jr | 97.39 | Cal Ripken Jr | 97.39 |
| Rogers Hornsby | 97.01 | Rogers Hornsby | 97.01 | Rogers Hornsby | 97.01 |
| Lou Gehrig | 95.87 | Lou Gehrig | 95.87 | Lou Gehrig | 95.87 |
| without rm | efWAR | rm after 1950 | efWAR | rm after 1994 | efWAR |
|---|---|---|---|---|---|
| Barry Bonds | 145.24 | Barry Bonds | 145.24 | Barry Bonds | 145.24 |
| Roger Clemens | 141.25 | Roger Clemens | 141.25 | Roger Clemens | 141.25 |
| Willie Mays | 135.39 | Willie Mays | 135.39 | Willie Mays | 135.39 |
| Henry Aaron | 128.05 | Henry Aaron | 128.05 | Henry Aaron | 128.05 |
| Greg Maddux | 120.73 | Greg Maddux | 120.73 | Greg Maddux | 120.73 |
| Babe Ruth | 120.28 | Babe Ruth | 120.28 | Babe Ruth | 120.28 |
| Stan Musial | 113.03 | Stan Musial | 113.03 | Stan Musial | 113.03 |
| Alex Rodriguez | 110.30 | Alex Rodriguez | 110.30 | Alex Rodriguez | 110.30 |
| Randy Johnson | 109.77 | Randy Johnson | 109.77 | Randy Johnson | 109.77 |
| Ty Cobb | 108.77 | Ty Cobb | 108.77 | Ty Cobb | 108.77 |
| Nolan Ryan | 108.30 | Nolan Ryan | 108.30 | Nolan Ryan | 108.30 |
| Ted Williams | 107.75 | Ted Williams | 107.75 | Ted Williams | 107.75 |
| Mike Schmidt | 106.41 | Mike Schmidt | 106.41 | Mike Schmidt | 106.41 |
| Rickey Henderson | 103.90 | Rickey Henderson | 103.90 | Rickey Henderson | 103.90 |
| Bert Blyleven | 101.82 | Bert Blyleven | 101.82 | Bert Blyleven | 101.82 |
| Steve Carlton | 100.34 | Steve Carlton | 100.34 | Steve Carlton | 100.34 |
| Lefty Grove | 98.80 | Lefty Grove | 98.80 | Lefty Grove | 98.80 |
| Albert Pujols | 97.34 | Albert Pujols | 97.34 | Albert Pujols | 97.34 |
| Joe Morgan | 96.07 | Joe Morgan | 96.07 | Joe Morgan | 96.07 |
| Frank Robinson | 95.92 | Frank Robinson | 95.92 | Frank Robinson | 95.92 |
| Mel Ott | 95.72 | Mel Ott | 95.72 | Mel Ott | 95.72 |
| Tris Speaker | 95.13 | Tris Speaker | 95.13 | Tris Speaker | 95.13 |
| Justin Verlander | 95.07 | Justin Verlander | 95.07 | Justin Verlander | 95.07 |
| Gaylord Perry | 94.45 | Gaylord Perry | 94.45 | Gaylord Perry | 94.45 |
| Rogers Hornsby | 94.42 | Rogers Hornsby | 94.42 | Rogers Hornsby | 94.42 |
The tables below are the top 25 ebJAWS and efJAWS for batters and pitchers combined with respect to this sensitivity analysis.
| without rm | ebJAWS | rm after 1950 | ebJAWS | rm after 1994 | ebJAWS |
|---|---|---|---|---|---|
| Barry Bonds | 109.14 | Barry Bonds | 108.86 | Barry Bonds | 109.01 |
| Roger Clemens | 107.47 | Roger Clemens | 106.80 | Roger Clemens | 107.03 |
| Willie Mays | 105.16 | Willie Mays | 104.96 | Willie Mays | 105.32 |
| Babe Ruth | 100.70 | Babe Ruth | 100.70 | Babe Ruth | 100.70 |
| Henry Aaron | 95.11 | Henry Aaron | 94.38 | Henry Aaron | 95.03 |
| Alex Rodriguez | 91.66 | Alex Rodriguez | 91.01 | Alex Rodriguez | 91.01 |
| Stan Musial | 88.38 | Stan Musial | 87.15 | Stan Musial | 88.24 |
| Randy Johnson | 88.20 | Randy Johnson | 87.10 | Randy Johnson | 87.78 |
| Albert Pujols | 86.33 | Lefty Grove | 84.52 | Greg Maddux | 84.78 |
| Greg Maddux | 85.60 | Mike Schmidt | 84.25 | Mike Schmidt | 84.53 |
| Mike Schmidt | 84.52 | Albert Pujols | 84.14 | Lefty Grove | 84.52 |
| Lefty Grove | 84.52 | Greg Maddux | 83.77 | Albert Pujols | 84.14 |
| Ted Williams | 83.54 | Ted Williams | 83.36 | Ted Williams | 83.59 |
| Ty Cobb | 82.62 | Ty Cobb | 82.60 | Ty Cobb | 82.60 |
| Rickey Henderson | 82.14 | Rickey Henderson | 80.89 | Rickey Henderson | 81.05 |
| Justin Verlander | 80.19 | Joe Morgan | 78.01 | Joe Morgan | 79.04 |
| Joe Morgan | 79.03 | Tom Seaver | 77.36 | Tom Seaver | 78.49 |
| Tom Seaver | 78.50 | Rogers Hornsby | 76.61 | Cal Ripken Jr | 76.85 |
| Cal Ripken Jr | 77.54 | Cal Ripken Jr | 75.98 | Rogers Hornsby | 76.61 |
| Mike Trout | 77.26 | Lou Gehrig | 75.70 | Lou Gehrig | 75.70 |
| Rogers Hornsby | 76.61 | Wade Boggs | 74.72 | Mickey Mantle | 75.11 |
| Lou Gehrig | 75.70 | Tris Speaker | 74.67 | Bert Blyleven | 75.01 |
| Wade Boggs | 75.61 | Mickey Mantle | 74.50 | Wade Boggs | 75.00 |
| Clayton Kershaw | 75.10 | Mel Ott | 74.19 | Tris Speaker | 74.67 |
| Mickey Mantle | 75.09 | Justin Verlander | 73.64 | Mel Ott | 74.19 |
| without rm | efJAWS | rm after 1950 | efJAWS | rm after 1994 | efJAWS |
|---|---|---|---|---|---|
| Roger Clemens | 103.54 | Roger Clemens | 103.11 | Roger Clemens | 103.26 |
| Barry Bonds | 103.21 | Barry Bonds | 102.94 | Barry Bonds | 103.06 |
| Willie Mays | 98.72 | Willie Mays | 98.49 | Willie Mays | 98.84 |
| Babe Ruth | 90.37 | Babe Ruth | 90.37 | Babe Ruth | 90.37 |
| Henry Aaron | 89.52 | Henry Aaron | 88.70 | Henry Aaron | 89.45 |
| Greg Maddux | 88.51 | Greg Maddux | 87.72 | Greg Maddux | 88.14 |
| Randy Johnson | 85.78 | Randy Johnson | 85.08 | Randy Johnson | 85.52 |
| Alex Rodriguez | 84.35 | Alex Rodriguez | 83.77 | Alex Rodriguez | 83.77 |
| Stan Musial | 83.61 | Stan Musial | 82.43 | Stan Musial | 83.46 |
| Ted Williams | 82.72 | Ted Williams | 82.41 | Ted Williams | 82.78 |
| Mike Schmidt | 82.20 | Mike Schmidt | 81.85 | Mike Schmidt | 82.21 |
| Lefty Grove | 79.31 | Lefty Grove | 79.31 | Lefty Grove | 79.31 |
| Ty Cobb | 78.85 | Ty Cobb | 78.83 | Ty Cobb | 78.83 |
| Rickey Henderson | 78.83 | Rickey Henderson | 77.38 | Steve Carlton | 78.32 |
| Steve Carlton | 78.32 | Steve Carlton | 77.22 | Rickey Henderson | 77.72 |
| Nolan Ryan | 76.91 | Nolan Ryan | 75.67 | Nolan Ryan | 76.94 |
| Albert Pujols | 75.80 | Bert Blyleven | 74.31 | Joe Morgan | 75.14 |
| Joe Morgan | 75.14 | Rogers Hornsby | 74.27 | Bert Blyleven | 75.12 |
| Bert Blyleven | 75.12 | Joe Morgan | 73.99 | Rogers Hornsby | 74.27 |
| Justin Verlander | 74.86 | Albert Pujols | 73.53 | Mickey Mantle | 73.72 |
| Rogers Hornsby | 74.27 | Lou Gehrig | 73.41 | Albert Pujols | 73.53 |
| Mike Trout | 73.90 | Mickey Mantle | 73.19 | Lou Gehrig | 73.41 |
| Mickey Mantle | 73.71 | Cal Ripken Jr | 72.06 | Cal Ripken Jr | 72.77 |
| Cal Ripken Jr | 73.62 | Walter Johnson | 72.00 | Walter Johnson | 72.00 |
| Lou Gehrig | 73.41 | Mel Ott | 71.10 | Mel Ott | 71.10 |
We will test four factors in our Full House Model and use batting average to illustrate it. These four factors are park-factor effect, population change, component distribution, and season size effect. The table shows the value of the era-adjusted BA of Tony Gwynn in the 1997 season minus the era-adjusted BA of Ty Cobb in the 1911 season under different configurations. The PF column indicates whether we apply the park-factor adjustment to the BA. The YES indicates we apply park-factor adjustment, and NO indicates we did not. The pops column indicates the population changes we apply to the MLB-eligible population. The 0.5_favor shows we consider 50% favorite sport, which is the MLB eligible population we use; The 0.75_favor shows we consider 75% favorite sport; The 1_favor shows we consider 100% favorite; The constant shows we assume the MLB eligible population did not change over time and we set it to 1 million; The erosion shows we consider minor league erosion. The details about the how we estimate the MLB eligible population can be found in the tech report. The para column indicates we use parametric distribution or non-parametric distribution to measure the BA in each season. The para indicates we use parametric distribution to measure BA and nonpara indicates we use non-parametric distribution to measure BA. The league column indicates we consider two different league sizes as the number of components in each season. The Historical shows we use historical league size as the number of components in each season. The Fixed shows that we compute the maximum number of components in every season and consider this value as the fixed number of components each season. The diff column shows the value of the era-adjusted BA of Tony Gwynn in the 1997 season minus the BA of Ty Cobb in the 1911 season
| PF | pops | para | league | diff |
|---|---|---|---|---|
| YES | 0.5_favor | para | historical | 0.020 |
| YES | 0.5_favor | para | fixed | 0.008 |
| YES | 0.5_favor | nonpara | historical | 0.040 |
| YES | 0.5_favor | nonpara | fixed | 0.040 |
| YES | 0.75_favor | para | historical | 0.017 |
| YES | 0.75_favor | para | fixed | 0.006 |
| YES | 0.75_favor | nonpara | historical | 0.035 |
| YES | 0.75_favor | nonpara | fixed | 0.035 |
| YES | 1_favor | para | historical | 0.014 |
| YES | 1_favor | para | fixed | 0.003 |
| YES | 1_favor | nonpara | historical | 0.023 |
| YES | 1_favor | nonpara | fixed | 0.023 |
| YES | constant | para | historical | 0.006 |
| YES | constant | para | fixed | -0.005 |
| YES | constant | nonpara | historical | -0.002 |
| YES | constant | nonpara | fixed | -0.002 |
| YES | erosion | para | historical | 0.010 |
| YES | erosion | para | fixed | -0.001 |
| YES | erosion | nonpara | historical | 0.006 |
| YES | erosion | nonpara | fixed | 0.006 |
| NO | 0.5_favor | para | historical | 0.005 |
| NO | 0.5_favor | para | fixed | 0.002 |
| NO | 0.5_favor | nonpara | historical | 0.029 |
| NO | 0.5_favor | nonpara | fixed | 0.029 |
| NO | 0.75_favor | para | historical | 0.003 |
| NO | 0.75_favor | para | fixed | -0.001 |
| NO | 0.75_favor | nonpara | historical | 0.027 |
| NO | 0.75_favor | nonpara | fixed | 0.027 |
| NO | 1_favor | para | historical | 0.000 |
| NO | 1_favor | para | fixed | -0.003 |
| NO | 1_favor | nonpara | historical | 0.015 |
| NO | 1_favor | nonpara | fixed | 0.015 |
| NO | constant | para | historical | -0.007 |
| NO | constant | para | fixed | -0.010 |
| NO | constant | nonpara | historical | -0.003 |
| NO | constant | nonpara | fixed | -0.003 |
| NO | erosion | para | historical | -0.004 |
| NO | erosion | para | fixed | -0.008 |
| NO | erosion | nonpara | historical | 0.007 |
| NO | erosion | nonpara | fixed | 0.007 |
We also use bWAR to test the effect of population change, season size and trimming method in our Full House Model. The table shows the value of the career bWAR of Willie Mays minus the career bWAR of Babe Ruth under different configurations before and after applying the trimming method. The pops column indicates the population changes we apply to the MLB-eligible population. The 0.5_favor shows we consider 50% favorite sport, which is the MLB eligible population we use; The 0.75_favor shows we consider 75% favorite sport; The 1_favor shows we consider 100% favorite; The constant shows we assume the MLB eligible population did not change over time and we set it to 1 million; The erosion shows we consider minor league erosion. The details about the how we estimate the MLB eligible population can be found in the tech report. The league column indicates we consider two different league sizes as the number of components in each season. The historical shows we use historical league size as the number of components in each season. The fixed shows that we compute the maximum number of components in every season and consider this value as the fixed number of components each season. The diff_after indicates the value of the career bWAR of Willie Mays minus the career bWAR of Babe Ruth under different configurations after applying the trimming method. The diff_before indicates the value of the career bWAR of Willie Mays minus the career bWAR of Babe Ruth under different configurations before applying the trimming method.
| pops | league | diff_after | diff_before |
|---|---|---|---|
| 0.5_favor | historical | 6.22 | 11.50 |
| 0.5_favor | fixed | 6.21 | 11.52 |
| 0.75_favor | historical | 1.37 | 6.57 |
| 0.75_favor | fixed | 1.35 | 6.61 |
| 1_favor | historical | -2.76 | -0.98 |
| 1_favor | fixed | -2.82 | -0.78 |
| constant | historical | -20.95 | -21.34 |
| constant | fixed | -20.97 | -22.01 |
| erosion | historical | -16.68 | -18.93 |
| erosion | fixed | -18.97 | -19.63 |
| name | eJAWS | name | eJAWS | name | eJAWS | name | eJAWS | name | eJAWS |
|---|---|---|---|---|---|---|---|---|---|
| Barry Bonds | 106.17 | Barry Bonds | 106.47 | Willie Mays | 106.10 | Babe Ruth | 112.24 | Babe Ruth | 117.14 |
| Roger Clemens | 105.50 | Roger Clemens | 104.84 | Babe Ruth | 105.83 | Barry Bonds | 106.50 | Ty Cobb | 108.87 |
| Willie Mays | 101.94 | Willie Mays | 104.02 | Barry Bonds | 105.22 | Roger Clemens | 104.36 | Willie Mays | 107.52 |
| Babe Ruth | 95.50 | Babe Ruth | 100.28 | Roger Clemens | 103.35 | Willie Mays | 104.19 | Barry Bonds | 106.98 |
| Henry Aaron | 92.32 | Henry Aaron | 94.03 | Henry Aaron | 97.05 | Ty Cobb | 100.50 | Cy Young | 106.41 |
| Alex Rodriguez | 88.00 | Stan Musial | 88.33 | Ty Cobb | 92.31 | Walter Johnson | 95.30 | Roger Clemens | 105.75 |
| Greg Maddux | 87.06 | Alex Rodriguez | 87.46 | Stan Musial | 91.79 | Henry Aaron | 93.98 | Cap Anson | 105.38 |
| Randy Johnson | 86.99 | Greg Maddux | 86.28 | Lefty Grove | 89.55 | Stan Musial | 93.94 | Walter Johnson | 104.81 |
| Stan Musial | 86.00 | Randy Johnson | 86.07 | Ted Williams | 87.53 | Lefty Grove | 92.66 | Honus Wagner | 101.93 |
| Mike Schmidt | 83.36 | Ty Cobb | 86.04 | Alex Rodriguez | 85.86 | Honus Wagner | 91.14 | Henry Aaron | 98.54 |
| Ted Williams | 83.13 | Ted Williams | 85.27 | Walter Johnson | 85.74 | Tris Speaker | 90.48 | Stan Musial | 98.13 |
| Lefty Grove | 81.91 | Lefty Grove | 85.18 | Greg Maddux | 84.66 | Cy Young | 89.53 | Tris Speaker | 97.73 |
| Albert Pujols | 81.06 | Mike Schmidt | 83.93 | Mike Schmidt | 84.41 | Ted Williams | 89.32 | Lefty Grove | 93.98 |
| Ty Cobb | 80.74 | Rickey Henderson | 79.87 | Randy Johnson | 84.18 | Rogers Hornsby | 88.20 | Eddie Collins | 93.82 |
| Rickey Henderson | 80.48 | Albert Pujols | 79.37 | Rogers Hornsby | 83.75 | Alex Rodriguez | 87.94 | Ted Williams | 92.09 |
| Justin Verlander | 77.82 | Rogers Hornsby | 79.16 | Tris Speaker | 82.72 | Greg Maddux | 86.59 | Rogers Hornsby | 92.03 |
| Joe Morgan | 77.09 | Joe Morgan | 78.39 | Lou Gehrig | 81.15 | Randy Johnson | 85.56 | Nap Lajoie | 88.60 |
| Cal Ripken Jr | 75.58 | Walter Johnson | 77.96 | Joe Morgan | 79.85 | Eddie Collins | 85.07 | Greg Maddux | 87.34 |
| Mike Trout | 75.58 | Tris Speaker | 77.37 | Mel Ott | 79.75 | Mel Ott | 84.97 | Randy Johnson | 86.86 |
| Rogers Hornsby | 75.44 | Lou Gehrig | 77.32 | Honus Wagner | 79.60 | Lou Gehrig | 84.76 | Mel Ott | 86.76 |
| Bert Blyleven | 75.06 | Justin Verlander | 76.57 | Rickey Henderson | 78.68 | Mike Schmidt | 83.19 | Lou Gehrig | 86.26 |
| Lou Gehrig | 74.56 | Mickey Mantle | 76.25 | Mickey Mantle | 78.68 | Cap Anson | 81.33 | Alex Rodriguez | 85.12 |
| Mickey Mantle | 74.40 | Mike Trout | 75.43 | Tom Seaver | 76.67 | Albert Pujols | 81.03 | Pete Alexander | 83.86 |
| Steve Carlton | 74.40 | Bert Blyleven | 75.43 | Steve Carlton | 76.54 | Rickey Henderson | 80.28 | Mike Schmidt | 83.62 |
| Tom Seaver | 74.15 | Steve Carlton | 75.33 | Frank Robinson | 76.46 | Jimmie Foxx | 78.52 | Mickey Mantle | 83.50 |
We will test three factors in our Full House Model and use home run to illustrate it. These four factors are park-factor effect, population change, and season size effect. The table shows the value of the era-adjusted HR of Barry Bonds in the 2001 season minus the era-adjusted HR of Babe Ruth in the 1920 season under different configurations. The PF column indicates whether we apply the park-factor adjustment to the HR. The YES indicates we apply park-factor adjustment, and NO indicates we did not. The pops column indicates the population changes we apply to the MLB-eligible population. The 0.5_favor shows we consider 50% favorite sport, which is the MLB eligible population we use; The 0.75_favor shows we consider 75% favorite sport; The 1_favor shows we consider 100% favorite; The constant shows we assume the MLB eligible population did not change over time and we set it to 1 million; The erosion shows we consider minor league erosion. The details about the how we estimate the MLB eligible population can be found in the tech report. The league column indicates we consider two different league sizes as the number of components in each season. The Historical shows we use historical league size as the number of components in each season. The Fixed shows that we compute the maximum number of components in every season and consider this value as the fixed number of components each season. The diff column shows the value of the era-adjusted HR of Barry Bonds in the 2001 season minus the era-adjusted HR of Babe Ruth in the 1920 season.
| PF | pops | league | diff |
|---|---|---|---|
| YES | 0.5_favor | historical | -0.0366554 |
| YES | 0.5_favor | fixed | -0.0366529 |
| YES | 0.75_favor | historical | -0.0115940 |
| YES | 0.75_favor | fixed | -0.0115925 |
| YES | 1_favor | historical | 0.0066636 |
| YES | 1_favor | fixed | 0.0066644 |
| YES | erosion | historical | 0.0196551 |
| YES | erosion | fixed | 0.0196557 |
| YES | constant | historical | 0.0284328 |
| YES | constant | fixed | 0.0284331 |
| NO | 0.5_favor | historical | -0.0099727 |
| NO | 0.5_favor | fixed | -0.0099658 |
| NO | 0.75_favor | historical | 0.0378487 |
| NO | 0.75_favor | fixed | 0.0378545 |
| NO | 1_favor | historical | 0.0841428 |
| NO | 1_favor | fixed | 0.0841461 |
| NO | erosion | historical | 0.1004012 |
| NO | erosion | fixed | 0.1004028 |
| NO | constant | historical | 0.0995427 |
| NO | constant | fixed | 0.0995433 |
| w/o rm | eJAWS | rm 1950 | eJAWS | rm 1994 | eJAWS |
|---|---|---|---|---|---|
| Barry Bonds | 106.17 | Barry Bonds | 105.90 | Barry Bonds | 106.03 |
| Roger Clemens | 105.50 | Roger Clemens | 104.96 | Roger Clemens | 105.15 |
| Willie Mays | 101.94 | Willie Mays | 101.72 | Willie Mays | 102.08 |
| Babe Ruth | 95.50 | Babe Ruth | 95.50 | Babe Ruth | 95.50 |
| Henry Aaron | 92.32 | Henry Aaron | 91.54 | Henry Aaron | 92.24 |
| Alex Rodriguez | 88.00 | Alex Rodriguez | 87.39 | Alex Rodriguez | 87.39 |
| Greg Maddux | 87.06 | Randy Johnson | 86.09 | Randy Johnson | 86.65 |
| Randy Johnson | 86.99 | Greg Maddux | 85.75 | Greg Maddux | 86.46 |
| Stan Musial | 86.00 | Stan Musial | 84.79 | Stan Musial | 85.85 |
| Mike Schmidt | 83.36 | Mike Schmidt | 83.05 | Mike Schmidt | 83.37 |
| Ted Williams | 83.13 | Ted Williams | 82.88 | Ted Williams | 83.18 |
| Lefty Grove | 81.91 | Lefty Grove | 81.91 | Lefty Grove | 81.91 |
| Albert Pujols | 81.06 | Ty Cobb | 80.72 | Ty Cobb | 80.72 |
| Ty Cobb | 80.74 | Rickey Henderson | 79.13 | Rickey Henderson | 79.38 |
| Rickey Henderson | 80.48 | Albert Pujols | 78.84 | Albert Pujols | 78.84 |
| Justin Verlander | 77.82 | Joe Morgan | 76.00 | Joe Morgan | 77.09 |
| Joe Morgan | 77.09 | Rogers Hornsby | 75.44 | Rogers Hornsby | 75.44 |
| Cal Ripken Jr | 75.58 | Lou Gehrig | 74.56 | Bert Blyleven | 75.06 |
| Mike Trout | 75.58 | Cal Ripken Jr | 74.02 | Cal Ripken Jr | 74.81 |
| Rogers Hornsby | 75.44 | Mickey Mantle | 73.84 | Lou Gehrig | 74.56 |
| Bert Blyleven | 75.06 | Bert Blyleven | 73.78 | Mickey Mantle | 74.41 |
| Lou Gehrig | 74.56 | Tom Seaver | 73.05 | Steve Carlton | 74.40 |
| Mickey Mantle | 74.40 | Steve Carlton | 72.97 | Tom Seaver | 74.13 |
| Steve Carlton | 74.40 | Wade Boggs | 72.65 | Wade Boggs | 73.05 |
| Tom Seaver | 74.15 | Mel Ott | 72.64 | Mel Ott | 72.64 |
In this section, we use different talent generating process to verify the robustness of the model.
The table below shows the top 25 bWAR players using four different talent generating process. The talent follows folded normal distribution, normal distribution, Pareto distribution with \(\alpha = 3\) and Pareto distribution with \(\alpha = 1.16\). Given that the ranking lists produced by the four separate generating processes are identical, we can say that our Full House model’s talent generating process is fairly robust.
| rank | standard normal | era-adjusted bWAR | Folded normal (mu = 0, sigma = 1) | era-adjusted bWAR | Pareto with alpha = 3 | era-adjusted bWAR | Pareto with alpha = 1.16 | era-adjusted bWAR |
|---|---|---|---|---|---|---|---|---|
| 1 | Barry Bonds | 153.93482003704 | Barry Bonds | 153.934820057079 | Barry Bonds | 153.93482003704 | Barry Bonds | 153.900596672323 |
| 2 | Roger Clemens | 145.907695934742 | Roger Clemens | 145.907695925218 | Roger Clemens | 145.907695934742 | Roger Clemens | 145.907695934742 |
| 3 | Willie Mays | 144.20526296462 | Willie Mays | 144.205262974907 | Willie Mays | 144.20526296462 | Willie Mays | 144.095468112426 |
| 4 | Henry Aaron | 135.560701275838 | Henry Aaron | 135.560701289904 | Henry Aaron | 135.560701275838 | Henry Aaron | 135.601861141104 |
| 5 | Babe Ruth | 132.702324888145 | Babe Ruth | 132.702324893792 | Babe Ruth | 132.702324888145 | Babe Ruth | 132.702324888145 |
| 6 | Stan Musial | 119.25776183457 | Stan Musial | 119.257761829684 | Stan Musial | 119.25776183457 | Stan Musial | 119.509994291424 |
| 7 | Alex Rodriguez | 119.05928887207 | Alex Rodriguez | 119.059288892948 | Alex Rodriguez | 119.05928887207 | Alex Rodriguez | 119.066070765468 |
| 8 | Greg Maddux | 113.670801140907 | Greg Maddux | 113.670801146666 | Greg Maddux | 113.670801140907 | Greg Maddux | 113.670801140907 |
| 9 | Ty Cobb | 112.000334920716 | Ty Cobb | 112.000334926544 | Ty Cobb | 112.000334920716 | Ty Cobb | 112.003670261108 |
| 10 | Randy Johnson | 110.812019546253 | Randy Johnson | 110.812019560192 | Randy Johnson | 110.812019546253 | Albert Pujols | 111.852368250334 |
| 11 | Mike Schmidt | 109.61435217199 | Mike Schmidt | 109.614352185456 | Mike Schmidt | 109.61435217199 | Randy Johnson | 110.812019546253 |
| 12 | Albert Pujols | 109.147332313036 | Albert Pujols | 109.147332303033 | Albert Pujols | 109.147332313036 | Mike Schmidt | 109.591837967447 |
| 13 | Rickey Henderson | 109.137893304437 | Rickey Henderson | 109.137893313238 | Rickey Henderson | 109.137893304437 | Rickey Henderson | 109.063644359715 |
| 14 | Ted Williams | 108.155500712145 | Ted Williams | 108.155500712963 | Ted Williams | 108.155500712145 | Ted Williams | 108.037777234477 |
| 15 | Tom Seaver | 104.28306193596 | Tom Seaver | 104.283061916776 | Tom Seaver | 104.28306193596 | Tom Seaver | 104.305946556062 |
| 16 | Lefty Grove | 101.582214503872 | Lefty Grove | 101.582214523742 | Lefty Grove | 101.582214503872 | Lefty Grove | 101.582214503872 |
| 17 | Tris Speaker | 100.72466821009 | Tris Speaker | 100.72466820624 | Tris Speaker | 100.72466821009 | Tris Speaker | 100.699554124141 |
| 18 | Joe Morgan | 100.182322428416 | Joe Morgan | 100.182322426815 | Joe Morgan | 100.182322428416 | Justin Verlander | 100.240996144687 |
| 19 | Frank Robinson | 100.036273382032 | Frank Robinson | 100.036273385659 | Frank Robinson | 100.036273382032 | Joe Morgan | 100.168379881699 |
| 20 | Bert Blyleven | 97.6905454153586 | Bert Blyleven | 97.6905454263732 | Bert Blyleven | 97.6905454153586 | Frank Robinson | 99.9106587464703 |
| 21 | Cal Ripken Jr | 97.4212949564655 | Cal Ripken Jr | 97.4212949308992 | Cal Ripken Jr | 97.4212949564655 | Bert Blyleven | 97.6905454153586 |
| 22 | Mel Ott | 96.690996783323 | Mel Ott | 96.6909967816042 | Mel Ott | 96.690996783323 | Cal Ripken Jr | 97.4130854322836 |
| 23 | Rogers Hornsby | 95.7583931883965 | Rogers Hornsby | 95.7583931873413 | Rogers Hornsby | 95.7583931883965 | Mel Ott | 96.716110869205 |
| 24 | Lou Gehrig | 95.6758386877758 | Lou Gehrig | 95.675838706285 | Lou Gehrig | 95.6758386877758 | Rogers Hornsby | 95.7583931883965 |
| 25 | Mickey Mantle | 95.4051563355476 | Mickey Mantle | 95.4051563322911 | Mickey Mantle | 95.4051563355476 | Lou Gehrig | 95.6758386877758 |
| name | BA | name | BA | name | year | BA | name | year | BA |
|---|---|---|---|---|---|---|---|---|---|
| Tony Gwynn | 0.342 | Tony Gwynn | 0.338 | Jose Altuve | 2014-2017 | 0.367 | Tony Gwynn | 1994-1997 | 0.360 |
| Rod Carew | 0.329 | Ty Cobb | 0.332 | Tony Gwynn | 1994-1997 | 0.366 | Ty Cobb | 1916-1919 | 0.353 |
| Jose Altuve | 0.327 | Rod Carew | 0.324 | Rod Carew | 1974-1977 | 0.363 | Wade Boggs | 1985-1988 | 0.351 |
| Ichiro Suzuki | 0.327 | Ichiro Suzuki | 0.322 | Miguel Cabrera | 2010-2013 | 0.355 | Rod Carew | 1974-1977 | 0.350 |
| Miguel Cabrera | 0.320 | Jose Altuve | 0.320 | Wade Boggs | 1985-1988 | 0.353 | Ichiro Suzuki | 2001-2004 | 0.348 |
| Roberto Clemente | 0.320 | Roberto Clemente | 0.318 | Ichiro Suzuki | 2001-2004 | 0.353 | Rogers Hornsby | 1921-1924 | 0.346 |
| Ty Cobb | 0.320 | Joe DiMaggio | 0.318 | Barry Bonds | 2001-2004 | 0.352 | Jose Altuve | 2014-2017 | 0.345 |
| Joe DiMaggio | 0.318 | Shoeless Joe Jackson | 0.316 | Joe Mauer | 2006-2009 | 0.350 | Mike Piazza | 1995-1998 | 0.345 |
| Wade Boggs | 0.316 | Wade Boggs | 0.314 | Roberto Clemente | 1964-1967 | 0.345 | Barry Bonds | 2001-2004 | 0.344 |
| Buster Posey | 0.316 | Freddie Freeman | 0.314 | Joe DiMaggio | 1938-1941 | 0.345 | Joe DiMaggio | 1939-1942 | 0.343 |
| Mike Trout | 0.315 | Stan Musial | 0.314 | Albert Pujols | 2003-2006 | 0.343 | Don Mattingly | 1984-1987 | 0.340 |
| Freddie Freeman | 0.314 | Ted Williams | 0.314 | Don Mattingly | 1984-1987 | 0.341 | Henry Aaron | 1956-1959 | 0.339 |
| Joe Mauer | 0.314 | Henry Aaron | 0.313 | Mike Piazza | 1995-1998 | 0.341 | Roberto Clemente | 1969-1972 | 0.338 |
| Ted Williams | 0.314 | Buster Posey | 0.313 | Willie Mays | 1957-1960 | 0.340 | Stan Musial | 1943-1946 | 0.338 |
| Stan Musial | 0.313 | Mike Trout | 0.312 | Matty Alou | 1966-1969 | 0.339 | Joe Mauer | 2006-2009 | 0.337 |
| Willie Mays | 0.312 | Matty Alou | 0.311 | Tim Anderson | 2019-2022 | 0.338 | Miguel Cabrera | 2010-2013 | 0.336 |
| Bill Terry | 0.312 | Miguel Cabrera | 0.311 | Stan Musial | 1943-1946 | 0.338 | Nap Lajoie | 1901-1904 | 0.336 |
| Robinson CanĂ³ | 0.311 | Robinson CanĂ³ | 0.311 | Rogers Hornsby | 1922-1925 | 0.335 | Albert Pujols | 2003-2006 | 0.336 |
| Henry Aaron | 0.310 | Vladimir Guerrero | 0.311 | Ted Williams | 1943-1946 | 0.335 | Matty Alou | 1966-1969 | 0.335 |
| Matty Alou | 0.310 | Joe Mauer | 0.311 | Ty Cobb | 1912-1915 | 0.334 | Honus Wagner | 1905-1908 | 0.335 |
| Vladimir Guerrero | 0.310 | Rogers Hornsby | 0.310 | Trea Turner | 2019-2022 | 0.334 | Tris Speaker | 1913-1916 | 0.334 |
| Derek Jeter | 0.310 | Willie Mays | 0.310 | Henry Aaron | 1956-1959 | 0.333 | Ted Williams | 1943-1946 | 0.334 |
| Al Oliver | 0.310 | Kirby Puckett | 0.310 | Cecil Cooper | 1980-1983 | 0.333 | Freddie Freeman | 2020-2023 | 0.332 |
| Lou Gehrig | 0.309 | Bill Terry | 0.310 | Freddie Freeman | 2020-2023 | 0.333 | Lou Gehrig | 1932-1935 | 0.332 |
| Edgar Martinez | 0.309 | Lou Gehrig | 0.309 | Nap Lajoie | 1901-1904 | 0.333 | Willie Mays | 1957-1960 | 0.332 |